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Abstract 


Effective  reasoning  about  complex  physical  systems  requires  the  use  of  models  that 
are  adequate  lor  the  task.  Constructing  such  adequate  models  is  often  difficult.  In 
this  dissertation,  wc  address  this  difficulty  by  developing  efficient  techniques  for  auto¬ 
matically  selecting  adequate  models  of  physical  systems.  We  focus  on  the  important 
task  of  generating  parsimonious  causal  explanations  for  phenomena  of  interest.  For¬ 
mally,  we  propose  answers  to  the  following:  (a)  what  is  a  model  and  what  is  the  space 
of  possible  models;  (b)  what  is  an  adequate  model;  and  (c)  how  do  we  find  adequate 
models. 

We  define  a  model  as  a  set  of  model  fragments,  where  a  model  fragment  is  a  set 
of  independent  equations  that  partially  describes  some  physical  phenomenon.  The 
space  of  possible  models  is  defined  implicitly  by  the  set  of  applicable  model  fragments: 
different  subsets  of  this  set  correspond  to  different  models.  An  adequate  model  is 
defined  as  a  simplest  model  that  can  explain  the  phenomenon  of  interest,  and  that 
satisfies  any  domain-independent  and  domain-dependent  constraints  on  the  structure 
and  behavior  of  the  physical  system. 

We  show  that,  in  general,  finding  an  adequate  model  is  intractable  (NP-hard). 

We  address  this  intractability,  by  introducing  a  set  of  restrictions,  and  use  these 
restrictions  to  develop  an  efficient  algorithm  for  finding  adequate  models.  The  most 
significant  restriction  is  that  all  the  approximation  relations  between  model  fragments  • 

are  required  to  be  causal  approximations.  In  practice  this  is  not  a  serious  restriction 
because  most  commonly  used  approximations  are  causal  approximations. 

We  also  develop  a  novel  order  of  magnitude  reasoning  technique,  which  strikes  & 
balance  between  purely  qualitative  and  purely  quantitative  methods.  The  order  of 
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magnitude  of  a  parameter  is  defined  on  a  logarithmic  scale,  and  a  set  of  rules  propagate 
orders  of  magnitudes  through  equations.  A  novel  feature  of  these  rules  is  that  they 
effectively  handle  non-linear  simultaneous  equations,  using  linear  programming  in 
conjunction  with  backtracking. 

The  techniques  described  in  this  dissertation  have  been  implemented  and  have 
•  been  tested  on  a  variety  of  electromechanical  devices.  These  tests  provide  empirical 

evidence  for  the  theoretical  claims  of  the  dissertation. 
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Chapter  1 
Introduction 


One  of  the  earliest  important  ideas  in  Artificial  Intelligence  is  that  effective  problem 
solving  requires  the  use  of  adequate  models  of  the  domain  [Amarel,  1968].  Ade¬ 
quate  models  incorporate  abstractions  and  approximations  that  are  well  suited  to 
the  problem  solving  task.  In  most  Artificial  Intelligence  research,  models  are  hand¬ 
crafted  by  a  user.  The  user  must  decide  what  domain  phenomena  are  relevant,  and 
must  select  appropriate  abstractions  and  approximations  that  adequately  describe 
these  phenomena.  In  most  real-world  domains,  constructing  such  models  is  a  diffi¬ 
cult,  error-prone,  and  time-consuming  task.  Automating  the  construction  of  adequate 
models  overcomes  these  drawbacks  and  provides  future  intelligent  programs  with  a 
useful  modeling  tool.  In  this  thesis  we  investigate  the  problem  of  selecting  adequate 
models  in  the  domain  of  physical  systems. 


1.1  Models  and  tasks 

Consider,  for  example,  the  schematic  of  a  bimetallic  strip  temperature  gauge,  from 
[Macaulay,  1988],  shown  in  Figure  1.1.  This  temperature  gauge  consists  of  a  battery,  a 
wire,  a  bimetallic  strip,  a  pointer,  and  a  thermistor.  A  thermistor  is  a  semi-conductor 
device;  a  small  increase  in  its  temperature  causes  a  large  decrease  in  its  resistance.  A 
bimetallic  strip  has  two  strips  made  of  different  metals  welded  together.  Temperature 
changes  cause  the  two  strips  to  expand  by  different  amounts,  causing  the  bimetallic 
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Container  of  water 


Figure  1.1:  A  temperature  gauge 


strip  to  bend. 

Consider,  now,  the  task  of  explaining  how  the  temperature  gauge  works,  i.e.,  how 
the  temperature  of  the  thermistor  determines  the  position  of  the  pointer  along  the 
scale.  A  trained  engineer  is  able  to  look  at  this  schematic  for  a  few  moments,  and 
provide  the  following  explanation;  the  thermistor  senses  the  water  temperature.  The 
thermistor’s  temperature  determines  the  thermistor’s  resistance,  which  determine® 
the  current  flowing  in  the  circuit.  This  determines  the  amount  of  heat  dissipated  in 
the  wire,  which  determines  the  temperature  of  the  bimetallic  strip.  The  temperature 
of  the  bimetallic  strip  determines  its  deflection,  which  determines  the  position  of  the 
pointer  along  the  scale. 

A  crucial  part  of  how  the  engineer  constructs  the  above  explanation  is  his  or 
her  ability  to  pick  out  just  the  relevant  phenomena  that  needed  to  be  modeled.  In 
particular,  the  engineer  decided  that  the  important  thing  to  model  about  the  wire  is 
that  it  generates  heat  as  current  flows  through  it.  The  explanation  is  not  cluttered 
by  references  to  irrelevant  phenomena,  such  as  the  electromagnetic  field  generated  by 
the  current  flow  in  the  wire. 

Now,  consider  a  slightly  different  task:  the  task  of  explaining  how  the  atmospheric 
temperature  affects  the  working  of  the  temperature  gauge,  i.e.,  how  the  temperature 
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of  the  atmosphere  affects  the  position  of  the  pointer  along  the  scale.  The  engineer’s 
explanation  would  be  as  follows:  the  temperature  of  the  atmosphere  determines  the 
temperature  of  the  bimetallic  strip,  which  determines  the  deflection  of  the  bimetallic 
strip.  The  amount  of  the  deflection  determines  the  position  of  the  pointer  along  the 
scale. 

In  constructing  the  above  explanation,  the  engineer  completely  disregaxded  the 
electrical  properties  of  the  wire,  the  battery,  and  the  thermistor.  Modeling  these 
phenomena  is  not  relevant  to  explaining  how  the  temperature  of  the  atmosphere 
affects  the  position  of  the  pointer  along  the  scale. 

In  addition  to  being  able  to  decide  which  phenomena  must  be  modeled,  the  engi¬ 
neer  is  also  able  to  identify  just  the  right  models  for  each  relevant  phenomena.  For 
example,  in  modeling  electrical  conduction  in  the  wire,  the  engineer  had  to  choose 
between  modeling  it  as  an  ideal  conductor,  a  constant  resistance  resistor,  or  a  resis¬ 
tor  whose  resistance  depends  on  its  temperature.  The  engineer  chose  the  constant 
resistance  resistor  model  because  (a)  no  heat  is  dissipated  by  an  ideal  conductor,  and 
hence  modeling  the  wire’s  resistance  is  crucial  to  understanding  how  the  tempera¬ 
ture  gauge  works;  and  (b)  modeling  the  dependence  of  the  wire’s  resistance  on  its 
temperature  is  unnecessary — assuming  that  the  resistance  is  constant  is  adequate  for 
explaining  the  temperature  gauge’s  functioning. 

Of  course,  what  is  meant  by  “just  the  right  model”  for  each  relevant  phenomena 
is  task  dependent.  For  example,  consider  the  following  analysis  task:  predict  the 
position  of  the  pointer  along  the  scale  for  a  particular  thermistor  temperature.  If  a 
high  fidelity  prediction  is  required,  i.e.,  if  the  pointer’s  position  must  be  predicted  with 
high  accuracy,  then  the  engineer  would  model  the  dependence  of  the  wire’s  resistance 
on  its  temperature.  On  the  other  hand,  if  a  lower  fidelity  prediction  is  acceptable,  the 
engineer  would  once  again  use  the  simpler,  constant  resistance  model  for  the  wire, 
thereby  simplifying  the  prediction  process. 

What  maJces  the  above  modeling  decisions  peirticularly  intriguing  is  that  there  is 
usually  a  very  large  space  of  possible  models  to  choose  from.  Figure  1.2  shows  part 
of  the  space  of  possible  models  of  a  wire.  We  can  choose  to  model  its  electrical, 
electromagnetic,  or  thermal  properties,  or  we  can  choose  to  model  its  expansion  or 
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ideal-conductor 


constant-resistor 


dasdc-wire  thermal-resistor 
thennally-expanding-wire 
rigid-rota  ting-wire 
torsiotv^pring 

Figure  1.2:  The  possible  models  of  a  wire. 


rotation.  If  we  choose  to  model  it  as  an  electrical  conductor,  we  must  choose  between 
modeling  it  as  an  ideal  conductor,  or  as  a  resistor,  in  which  case  we  must  choose 
between  modeling  the  resistance  as  a  constant,  or  as  dependent  on  the  temperature. 
In  addition,  we  can  choose  to  model  the  heat  generated  in  the  wire  due  to  current 
flow. 

All  the  parts  of  the  temperature  gauge  have  a  similarly  large  set  of  possible  models. 
Hence,  the  set  of  possible  models  of  the  temperature  gauge,  constructed  by  selecting 
an  appropriate  subset  of  models  for  each  of  its  parts,  is  combinatoriaJly  large.  And 
yet,  an  engineer,  after  only  a  little  thought,  is  able  to  select  an  adequate  model  that 
is  specifically  tailored  for  each  task. 


1.2  Problem  statement 

This  thesis  is  about  automating  the  engineer’s  ability  to  select  adequate  models  for 
specific  tasks.  We  cast  the  problem  of  selecting  adequate  models  as  a  search  problem. 
To  do  this,  we  must  answer  the  following  three  questions: 

•  What  is  a  model,  and  what  is  the  space  of  possible  models?  (What  is  the  search 
space?) 

•  What  is  an  adequate  model?  (What  is  the  goal  criterion?) 
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•  How  do  we  search  the  space  of  possible  models  for  adequate  models?  (What  is 
the  search  strategy?) 

The  thesis  proposes  answers  to  each  of  the  above  questions  in  the  domain  of 
physical  systems  and  for  the  task  of  generating  parsimonious  causal  explanations. 


1.3  Proposed  solution:  An  overview 

In  this  section  we  give  a  brief  overview  of  our  answers  to  the  questions  raised  in  the 
previous  section.  The  rest  of  the  thesis  develops  these  ideas  in  detail. 

1.3.1  What  is  a  model  and  what  is  the  space  of  possible 
models? 

In  this  thesis,  we  will  be  concerned  with  models  of  the  behavior  of  physical  systems. 
Such  models  are  best  expressed  as  a  set  of  algebraic  and/or  differential  equations, 
that  describe  various  phenomena  of  interest.  Hence,  a  model  is  a  set  of  equations. 
However,  rather  than  viewing  a  model  as  just  a  set  of  equations,  we  will  view  it  as  a 
set  of  model  fragments.  A  model  fragment  is  a  set  of  equations  that  partially  describe 
a  single  phenomenon,  usually  a  single  mechanism.  For  example, 

{V;  =  iRn,} 

is  a  model  fragment  describing  electrical  conduction  in  the  wire.  Note  that  it  is  a 
partial  description  of  electrical  conduction,  since  it  does  not  include  any  description 
of  the  variation  of  the  resistance  Ru!.  Figure  1.3  shows  the  model  fragments,  and  as¬ 
sociated  equations,  in  a  possible  model  of  the  temperature  gauge  shown  in  Figure  1.1. 
Model  fragments  provide  an  appropriate  level  of  description:  (a)  they  are  much  easier 
to  create  than  complete  models;  (b)  unlike  complete  models,  they  are  significantly 
more  reusable;  and  (c)  not  all  meaningful  physiccil  phenomena  can  be  represented  by 
a  single  equation. 

The  space  of  possible  models  is  defined  implicitly  by  the  set  of  model  fragments 
that  can  be  composed  to  form  models.  The  set  of  model  fragments  that  can  be  so 
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Linkage (bms.ptr) :  6p  =  k^Xb 
Thermal-bms(bms)  :  Xb  =  k^Tb 
Heat-f  low(bms , atm)  ;  fba  =  kz{Tb  -  To) 
Heat-flow(wire,bms)  :  fy,b  =  kA{Tyi  -  Tb) 
Constant-temperature(atm) :  exogenous{Ta) 
Thermal-equilibri\m(bms)  :  fba  =  fy^b 
Thermal-equilibrium (wire) ;  f^b  =  fw 
Resistor  (wire)  :  14  =  iwRw 
Constant -resistance  (wire)  exogenous{Iiu,) 
Thermal-resistance(wire)  :  =  144 

Electrical-thermist  or  (thermistor) :  Vt  =  ifRt]  Rt  = 

Constant- voltage-source  (battery) ;  exogenous{Vx,) 

Kirchhoff’s  laws;  14  =  14  +  !/<;  iv  =  it',  it=^iw 
Input :  exogenous{Tt ) 

Op’.  Pointer  angle  xj,:  Bms  deflection 

Ra,:  Wire  resistance  Rti  Thermistor  resistance 

it'-  Thermistor  current  VJ:  Thermistor  voltage 

4:  Wire  current  14:  Wire  voltage 

4:  Battery  current  K'-  Battery  voltage 

Tb'.  Bms  temperature  Ta'.  Atm  temperature 

Wire  temperature  T^:  Thermistor  temperature 

fba-  Heat  flow  (bms  to  atm)  4^:  Heat  flow  (wire  to  bms) 
fyji  Heat  generated  in  wire  kji  Exogenous  constants 

Figure  ]  .3:  A  possible  model  of  the  temperature  gauge 
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composed  is  defined  by  the  structure  of  the  physical  system.  The  structure  of  the 
physical  system  is  a  description  of  the  parts  of  the  system,  and  how  they  are  put 
together.  The  parts  that  can  be  used  to  describe  a  system’s  structure  are  drawn  from 
a  component  library.  Each  component  in  this  library  is  associated  with  a  set  of  model 
fragments,  describing  different  aspects  of  the  component’s  behavior.  For  example,  the 
set  of  model  fragments  associated  with  a  wire  would  include  those  shov/n  in  Figure  1.2. 
A  model  of  the  physical  system  is  a  subset  of  the  model  fragments  associated  with 
each  of  the  components  in  the  system. 

Hence,  our  answer  to  the  first  question  is: 

•  A  model  is  a  set  of  model  fragments. 

•  The  space  of  possible  models  of  the  physical  system  are  defined  by  the  structure 
of  the  system  and  a  component  library. 


1.3.2  What  is  an  adequate  model? 

We  define  the  adequacy  cf  a  model  using  three  criteria:  (a)  the  task;  (b)  domain 
dependent  constraints;  and  (c)  simplicity. 

The  task 

The  adequacy  of  a  model  can  only  be  determined  with  respect  to  a  task.  In  this 
thesis,  we  will  be  concentrating  on  the  task  of  providing  causal  explanations  for  a 
phenomenon  of  interest.  A  causal  explanation  is  an  explanation  in  terms  of  the 
underlying  causal  mechanisms  of  the  domain.  For  example,  the  explanations  in  the 
previous  section  were  causal  explanations.  We  have  chosen  this  task  because  of  its 
importance  in  reasoning  about  physical  systems.  Weld  and  de  Kleer  [Weld  and  de 
Kleer,  1990,  page  612]  summarize  its  importance  as  follows: 

. . .  humans  expect  to  be  provided  explanations  in  causal  terms.  . . .  Part 
of  the  motivation  for  developing  a  theory  of  causality  is  as  a  vehicle  for  a 
system  to  explain  its  conclusions. 
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Figure  1.4:  Causal  ordering  of  the  pcirameters 


Many  qualitative  physics  researchers  have  adopted  the  far  stronger  posi¬ 
tion  that  causality  is  fundamental  and  plays  a  central  role  in  reasoning 
about  physical  systems.  . . .  Causal  expltinations  are  important  to  engi¬ 
neers  because  they  are  an  explicit  representation  of  how  a  device  achieves 
its  behavior.  This  explanation  itself  forms  the  basis  for  subsequent  rea¬ 
soning.  In  design  tasks  it  is  important  to  reason  backward  from  effects  to 
causes  to  identify  what  changes  to  make  to  a  device  to  better  achieve  its 
specifications.  In  diagnosis  tasks,  it  is  important  to  reason  backward  to 
pinpoint  what  could  have  caused  the  symptoms.  The  causal  explanation 
can  guide  subsequent  quantitative  analysis  . . . 

Hence,  given  a  phenomenon  of  interest,  the  fundamental  criterion  for  the  ade¬ 
quacy  of  a  model  is  whether  or  not  it  is  able  to  provide  a  causal  explanation  of  the 
phenomenon.  To  check  whether  or  not  a  model  can  provide  an  explanation  for  a  phe¬ 
nomenon,  we  generate  the  causal  ordering  [de  Kleer  and  Brown,  1984;  Forbus,  1984; 
Williams,  1984;  Iwasaki  and  Simon,  1986b;  Iwasaki,  1988]  of  the  parameters  of  the 
model  using  the  equations  of  the  model.  The  causal  ordering  of  the  parameters  is 
a  dependency  ordering  of  the  parameters  that  reflects  an  engineers  notion  of  causal 
dependence  between  the  parameters.  The  causal  ordering  is  used  to  check  whether 
or  not  a  model  can  provide  an  explanation  for  a  phenomenon. 

For  example,  suppose  we  want  to  explain  how  the  temperature  gauge  in  Figure  1.1 
works,  i.e.,  to  explain  how  the  temperature  of  the  thermistor  (Tt)  causally  determines 
the  angular  position  of  the  pointer  (^p).  Figure  1.4  shows  the  causal  ordering  gen¬ 
erated  from  the  model  in  Figure  1.3.  Since  Op  is  causally  dependent  on  Tt  in  this 
causal  ordering,  the  model  in  Figure  1.3  is  adequate  for  the  task  of  explaining  how 
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the  temperature  gauge  in  Figure  1.1  works. 

Domain  dependent  constraints 

In  addition  to  requiring  that  an  adequate  model  be  able  to  explain  the  phenomenon 
of  interest,  an  engineer  may  wajit  it  to  satisfy  a  set  of  domain  dependent  constraints. 
Such  constraints  can  stem  from  the  structure  and  the  behavior  of  the  physical  system. 
For  example,  the  following  constraint: 

(implies 

(and  (Electromagnet  ?object) 

(Wire  ?object) 

(coiled- around  ?object  ?core) 
(magnetic-material  ?core)) 

(Magnet  ?core)) 

requires  that  if  the  electromagnetic  field  generated  by  a  wire  is  modeled  and  the  wire 
is  coiled  around  a  core  made  of  a  magnetic  material,  then  the  core  must  be  modeled 
as  a  magnet.  The  justification  for  this  domain  dependent  constraint  is  that  the  core 
amplifies  the  magnetic  field  by  three  or  four  orders  of  magnitude,  converting  the  core 
into  a  powerful  magnet.  Hence,  under  these  circumstances,  an  engineer  would  not 
consider  the  model  to  be  adequate  unless  the  core  were  modeled  as  a  magnet.  More 
generally,  an  adequate  model  must  satisfy  all  such  domain  dependent  constraints. 

Simplicity 

Not  all  explanations  of  a  phenomenon  are  parsimonious.  A  parsimonious  causal  ex¬ 
planation  is  a  causal  explanation  with  a  minimum  of  irrelevant  detail.  Irrelevant  detail 
is  introduced  into  explanations  because  either  (a)  irrelevant  phenomena  are  modeled; 
or  (b)  needlessly  complex  models  of  relevant  phenomena  axe  used.  For  example,  we 
could  introduce  irrelevant  detail  into  an  explanation  of  how  the  temperature  gauge 
in  Figure  1.1  works  by  modeling  the  electromagnetic  field  generated  by  the  wire.  We 
could  also  introduce  irrelevant  detail  into  this  explanation  by  modeling  the  tempera¬ 
ture  dependence  of  the  wire’s  resistance,  since  approximating  the  wire’s  resistance  by 
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assuming  that  it  is  constant  is  adequate  for  explaining  how  the  temperature  gauge 
works. 

To  minimize  the  amount  of  irrelevant  detail  in  an  explanation,  the  model  gener¬ 
ating  the  explanation  must  be  as  simple  as  possible.  The  notion  of  model  simplicity 
that  supports  the  generation  of  parsimonious  causal  explanations  is  based  on  a  prim¬ 
itive  approximation  relation  between  model  fragments.  The  intuition  underlying  our 
definition  of  model  simplicity  is  that  modeling  fewer  phenomena  more  approximately 
leads  to  simpler  models.  An  adequate  model  is  required  to  be  as  simple  as  possible 
according  to  this  ordering. 

Hence,  our  answer  to  the  second  question  is: 

•  An  adequate  model 

-  is  able  to  provide  causal  explanations  for  the  phenomenon  of  interest; 

-  satisfies  any  domain  dependent  constraints;  and 

-  is  as  simple  as  possible. 

Let  us  say  that  a  model  is  a  causal  model,  with  respect  to  a  phenomenon  of  interest, 
if  and  only  if  it  is  able  to  explain  the  phenomenon  and  if  the  domain  dependent 
constraints  are  satisfied.  Hence,  an  adequate  model  is  a  minimal  caused  model,  i.e.,  a 
causal  model  such  that  no  simpler  model  is  a  causal  model. 

1.3.3  How  do  we  find  adequate  models? 

Given  the  structure  of  the  physiced  system  and  a  component  library,  there  is  an  ex¬ 
ponentially  large  space  of  possible  models  of  the  physical  system.  We  will  show  later 
that  the  problem  of  finding  an  adequate  model  in  this  space  of  possible  models  is 
intractable  (NP-hard).  Intuitively,  this  means  that,  to  find  an  adequate  model,  we 
can  do  little  better  than  check  each  model  in  the  exponentially  large  space  of  possible 
models.  Even  for  small  systems,  this  space  is  extremely  large,  so  any  brute  force  ap¬ 
proach  is  out  of  the  question.  However,  this  seems  to  contradict  the  observation  that 
expert  engineers  are  able  to  provide  parsimonious  causal  explanations  for  phenomena 
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Figure  1.5:  Algorithm  for  finding  a  minimal  causal  model. 

after  only  a  little  bit  of  thought.  This  means  that  the  world  provides  additional  struc¬ 
ture,  which  can  be  exploited  to  develop  an  efficient,  polynoixiial  time  model  selection 
algorithm. 

Upward  failure  property 

One  property  that  is  likely  to  be  satisfied  in  modeling  the  physical  world  is  the 
upward  failure  property.  The  upward  failure  property  states  that  if  a  model  is  not  a 
causal  model,  then  no  simpler  model  is  a  causal  model.  Intuitively,  this  seems  like 
a  reasonable  property.  After  all,  if  a  model  is  unable  to  explain  the  phenomenon  of 
interest,  then  there  is  little  reason  to  beheve  that  a  simpler  model  is  able  to  provide  an 
explanation.  If  the  upward  failure  property  is  satisfied,  then,  given  an  initial  causal 
model,  the  algorithm  shown  in  Figure  1.5  can  be  used  to  efficiently  find  an  adequate 
model,  i.e.,  a  minimal  causal  model.  In  this  algorithm,  M  is  the  initial  causal  model, 
with  an  immediate  simplification  of  M  being  produced  by  either  replacing  a  model 
fragment  in  M  by  an  immediate  approximation,  or  by  dropping  a  model  fragment. 
The  algorithm  works  by  continually  replacing  M  by  an  immediate  simplification  that 
is  a  causal  model,  until  all  the  immediate  simplifications  of  M  axe  not  caused  models. 
The  upward  failure  property  then  tells  us  that  M  is  a  minimal  causal  model. 
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Causal  approximations 

The  upward  failure  property  is  useful  because  it  leads  to  a  polynomial  time  algo¬ 
rithm  for  finding  an  adequate  model.  However,  checking  whether  or  not  the  space  of 
possible  models  satisfies  the  upward  failure  property  is,  in  general,  difficult.  This  is 
because  the  upward  failure  property  is  a  global  property.  To  address  this  shortcoming 
we  have  identified  a  set  of  local  properties,  which  can  be  easily  checked  as  we  build 
up  the  component  library,  that  entail  the  upward  failure  property.  In  particular,  we 
have  identified  an  important  class  of  approximations  called  causal  approximations. 
When  all  the  approximations  are  causal  approximations,  replacing  a  model  fragment 
by  a  more  accurate  model  fragment  results  in  a  superset  of  causal  relations  between 
parameters.  This  forms  the  basis  for  proving  the  upward  failure  property,  and  the  use 
of  the  algorithm  shown  in  Figure  1.5.  Causal  approximations  are  particularly  useful 
because  they  are  common  in  modeling  the  physical  world.  For  example,  Table  1.1 
shows  a  number  of  commonly  used  approximations,  all  of  which  are  causal  approxi¬ 
mations.  These  approximations  are  described  in  greater  detail  in  Appendix  A.  The 
exact  definition  of  a  causal  approximation  is  found  in  Chapter  5. 


Inertialess  objects 

Inviscid  flow 

Rigid  bodies 

Frictionless  motion 

Elastic  collisions 

Ideal  gas  law 

Zero  or  constant  gravity 

Ideal  heat  engines 

Non-relativistic  mass  and  motion 

No  thermal  expansion 

Ideal  thermal  i.asulators  and  conductors 

Constant  thermal  conductance 

Ideal  electrical  insulators  and  conductors 

Constant  resistance  and  resistivity 

Table  1.1:  Examples  of  causal  approximations 


Finding  an  initial  causal  model 

The  algorithm  in  Figure  1.5  requires  us  to  find  an  initial  causal  model  M,  from  which 
to  start  the  simplification.  A  natural  choice  for  this  model  is  the  most  accurate  model 
describing  the  physical  system.  However,  starting  with  the  most  accurate  model  is 
often  undesirable.  Hence,  we  introduce  a  heuristic  method,  based  on  the  component 
interaction  heuristic,  that  allows  us  to  find  a  initial  causal  model.  For  example,  one 
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component  interaction  heuristic  is  the  following: 

(implies 

(and  (terminals  ?object  ?terml) 

(voltage-terminal  ?terml) 

(connected-to  ?terml  ?term2)) 

(voltage-terminal  ?term2)) 

which  says  that  if  any  terminal  of  a  component  is  modeled  as  a  voltage  terminal, 
then  all  terminals  connected  to  that  voltage  terminal  must  also  be  modeled  as  voltage 
terminals.  This  allows  the  components  corresponding  to  the  connected  terminals  to 
interact  by  sharing  voltages  at  those  terminals.  Note  that  the  above  constraint  does 
not  require  all  connected  terminals  to  be  modeled  as  voltage  terminals;  it  only  says 
that  if  a  terminal  is  a  voltage  terminal,  then  terminals  connected  to  it  must  also  be 
voltage  terminals.  We  use  such  heuristic  constraints  to  build  up  a  causal  model,  and 
then  use  the  algorithm  in  Figure  1.5  to  find  a  minimal  causal  model. 

In  summary,  the  answer  to  the  third  question  is  as  follows: 

•  When  all  the  approximations  are  causal  approximations,  an  adequate  model 
can  be  found  efficiently  by  first  identifying  an  initial  causal  model,  and  then 
simplifying  it. 


1.4  Contributions 

The  thesis  makes  the  following  important  contributions: 

•  It  introduces  a  novel  criterion  for  defining  model  adequacy:  the  criterion  that 
a  model  must  be  able  to  provide  a  parsimonious  causal  explanation  for  a  phe¬ 
nomenon  of  interest. 

•  It  presents  a  clear  formalization  of  the  model  selection  problem,  maldng  the 
problem  amenable  to  theoretical  analysis. 
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•  It  uses  the  above  formalization  to  analyze  the  complexity  of  finding  adequate 
models,  and  shows  that  the  problem  is  intractable.  This  analysis  yields  three 
differeiit  sources  of  intractability,  which  can  be  summarized  as  follows:  (a)  decid¬ 
ing  what  phenomena  to  model;  (b)  deciding  how  to  model  selected  phenomena; 
and  (c)  having  to  satisfy  all  the  domain  dependent  constraints. 

•  It  introduces  a  new  class  of  approximations,  called  causal  approximations,  which 
are  commonly  found  in  modeling  the  physical  world.  Causal  approximations  are 
important  because  they  lead  to  the  development  of  an  efficient  algorithm  for 
finding  adequate  models. 

•  It  introduces  a  novel  order  of  magnitude  reasoning  method  which  is  used  to 
generate  the  behavior  of  a  physical  system.  The  method  strikes  a  balance 
between  purely  quantitative  and  purely  qualitative  reasoning,  and  is  based  on 
defining  the  order  of  magnitude  of  a  quantity  on  a  logarithmic  scale.  This 
makes  the  method  applicable  even  in  the  presence  of  non-linear  simultaneous 
equations. 

•  It  introduces  the  component  interaction  heuristic  that  is  useful  in  finding  causal 
models. 

•  It  describes  an  implemented  representation  methodology  for  representing  the 
space  of  possible  models  of  a  physical  system. 

•  It  describes  the  results  of  testing  our  implementation  of  the  model  selection 
algorithm  on  x  variety  of  electromechanical  devices. 

1.5  Readers  guide 

The  rest  of  the  thesis  presents  the  details  of  our  solution  to  the  model  selection 
problem.  Chapter  2  is  a  detailed  answer  to  the  first  of  our  three  questions.  It  describes 
models  and  model  fragments,  and  shows  how  they  are  represented.  Chapter  3  is  a 
detailed  answer  to  the  second  of  our  three  questions.  It  describes  our  criteria  for  the 
adequacy  of  a  model.  These  two  chapters  are  central  to  understanding  this  thesis. 
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Chapter  4  presents  a  formalization  of  the  model  selection  problem,  and  uses  this 
formalization  to  analyze  the  complexity  of  finding  adequate  models.  It  shows  that  the 
general  problem  of  finding  adequate  models  in  NP-hard,  and  identifies  three  different 
sources  of  intractability.  Section  4.1,  which  presents  the  formalization,  is  necessary 
for  understanding  Chapters  5  and  6.  However,  readers  not  interested  in  the  details 
of  the  proofs  of  intractability  can  skip  the  rest  of  the  chapter. 

Chapter  5  contains  some  of  the  main  results  of  this  thesis.  It  introduces  the  upward 
failure  property,  and  uses  the  upward  failure  property  to  develop  an  efficient  algo¬ 
rithm  for  finding  an  adequate  model.  It  then  introduces  a  number  of  local  properties 
of  a  knowledge  base  that  ensure  that  the  global  upward  failure  property  is  satisfied. 
In  particulai,  this  chapter  introduces  the  class  of  causal  approximations,  and  dis¬ 
cusses  their  role  in  ensuring  that  the  upward  failure  property  is  satisfied.  Chapter  6 
generalizes  the  results  of  Chapter  5  to  models  involving  differential  equations. 

Chapter  7  presents  the  novel  order  of  magnitude  reasoning  method  that  we  use 
to  generate  the  behavior  of  the  physical  system.  This  behavior  is  used  to  evaluate 
some  of  the  domain  dependent  constraints  introduced  in  chapter  3.  This  chapter  is 
self-contained,  and  can  be  read  independently  of  the  rest  of  the  thesis. 

Finally,  Chapter  8  presents  the  component  interaction  heuristic  and  the  imple¬ 
mented  program  for  model  selection.  It  also  reports  on  our  experimental  results. 
Related  work  is  discussed  in  Chapter  9,  and  conclusions  and  future  work  are  dis¬ 
cussed  in  Chapter  10. 

We  conclude  this  introductory  chapter  with  a  brief  note  on  short  papers  that 
describe  different  aspects  of  this  thesis.  The  main  results  of  Chapters  4  and  5  are 
presented  in  [Nayak,  1992a].  An  overview  of  some  aspects  of  Chapters  2,  3,  and  8  is 
presented  in  [Nayak  et  al.,  1992].  Finally,  much  of  Chapter  7  is  reproduced  in  [Nayak, 
1992b]. 


Chapter  2 

Models  and  model  fragments 


In  this  chapter  we  describe  the  types  of  models  that  we  consider  in  this  thesis.  Fun¬ 
damentally,  we  will  be  considering  models  of  the  behavior  of  physical  systems,  that 
are  best  represented  as  sets  of  equations.  Section  2.1  discusses  the  different  types  of 
equations  that  can  be  used  in  models  of  physical  systems,  and  Section  2.2  discusses 
the  need  for  multiple  models  of  a  single  system.  The  next  two  sections  introduce 
model  fragments,  and  show  how  model  fragments  can  be  used  to  represent  the  space 
of  possible  models  of  a  physical  system.  The  final  section  of  this  chapter  discusses 
the  actual  representational  mechanisms  that  we  use  to  implement  these  ideas.  In 
particular,  we  introduce  a  class  level  description  of  components  and  model  fragments, 
and  show  how  these  classes  are  organized. 


2.1  Models  of  the  behavior  of  physical  systems 


In  this  thesis  we  will  be  concerned  with  models  of  the  behavior  of  physical  systems, 
typically  of  engineered  devices.  (In  the  rest  of  the  thesis  we  will  use  “device”  eis  a 
synonym  for  “physical  system.”)  Models  of  device  behavior  are  best  represented  £is  a 
set  of  equations  that  relate  a  set  of  parameters. 
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2.1.1  Parameters 

A  parameter  is  a  numerical  attribute  representing  a  physical  property  of  the  device, 
e.g.,  temperature  of  an  object,  voltage  drop  across  an  electrical  conductor,  magnetic 
field  in  a  region.  Parameters  are  usually  functions  of  both  time  and  space,  e.g.,  the 
temperature  of  an  object  can  vary  with  time  and  with  location  within  the  object. 

It  is  common  to  disregard  the  dependence  of  parameter  values  on  space  and/or 
time.  A  lumped  parameter  model  disregards  the  dependence  of  parameter  values  on 
spatial  location.  Such  models  make  the  assumption  that  the  variation  of  parameter 
values  over  a  specific  region  of  space  is  negligible,  with  the  primary  variation  being  as 
a  function  of  time.  For  example,  we  may  choose  to  model  the  temperature  of  an  object 
as  a  lumped  parameter,  i.e.,  assume  that  the  temperature  is  uniform  throughout  the 
object,  though  the  temperature  may  still  vary  with  time. 

An  equilibrium  model  disregards  the  dependence  of  parameter  values  on  time. 
Such  models  are  useful  for  modeling  the  asymptotic  behavior  of  devices,  i.e.,  device 
behavior  after  a  sufficiently  long  time  has  elapsed,  so  that  any  transient  behavior  has 
died  out.  For  example,  consider  a  wall  separating  a  heated  room  from  the  cold  air 
outside.  An  equilibrium  model  can  be  used  to  model  the  eventual  temperature  profile 
in  the  wall. 

2.1.2  Equations 

Equations  are  relations  between  parameters.  Different  types  of  equations  are  used  to 
represent  different  types  of  models.  The  most  general  types  of  equations  axe  partial 
differential  equations.  Partial  differential  equations  can  model  the  vaxiation  of  pa¬ 
rameter  values  over  both  time  and  space.  For  example,  the  well  known  Navier-Stokes 
equation  [Welty  et  ah,  1984]  is  a  partial  differential  equation  that  describes  fluid  flow 
as  a  function  of  both  time  and  space. 

Ordinary  differential  equations  can  model  the  variation  of  parameter  values  only 
as  a  function  of  a  single  independent  variable,  such  as  time.  Hence,  ordinary  differen¬ 
tial  equations  are  used  to  represent  lumped  parameter  device  models.  For  example, 
Hooke’s  law  [Halliday  and  Resnick,  1978]  is  an  ordinary  differential  equation  that 
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describes  the  behavior  of  a  simple  harmonic  oscillator  like  a  spring-block  system. 

Algebraic  equations  do  not  contain  any  partial  or  total  derivatives.  Hence,  they 
can  be  used  to  represent  equilibrium,  lumped  parameter  device  models.  For  exam¬ 
ple,  Ohm  s  law  [Halliday  and  Resnick,  1978]  is  an  algebraic  equation  describing  the 
relationship  between  current  flow  through  a  resistor  and  the  voltage  drop  across  the 
resistor. 

Another  widely  used  type  of  equation  is  the  qualitative  equation  [Bobrow,  1984; 
Kuipers,  1986).  Qualitative  equations  do  not  relate  the  exact  numerical  values  of  pa¬ 
rameters.  Instead,  they  represent  functional  dependencies  and  monotonicity  relations 
between  parameters.  For  example,  if  we  do  not  know  the  exact  functional  form  of  the 
relation  between  the  resistance  of  a  w'ire  and  its  temperature,  we  could  use  a  qual¬ 
itative  equation  to  express  the  fact  that  the  resistcince  functionally  depends  on  the 
temperature,  and  that  increasing  the  temperature  results  in  an  increase  in  resistance. 

In  this  thesis,  we  will  only  consider  lumped  parameter  models.  However,  we  will 
consider  both  time-varying  models,  as  well  as  equilibrium  models.  Hence,  we  have 
the  following: 

•  A  device  model  is  a  set  of  algebraic,  qualitative,  and/or  ordinary  differential 
equations,  relating  a  set  of  parameters. 

Figure  2.1  reproduces  the  temperature  gauge  introduced  in  the  previous  chapter. 
Figure  2.2  shows  a  set  of  equations  that  describe  this  temperature  gauge.  This  set 
of  equations  represents  an  equilibrium  model  of  the  temperature  gauge,  since  no 
differential  equations  are  used.  The  equation  exogenou${Q)  represents  the  fact  that 
the  value  of  Q  is  determined  exogenously;  it  can  be  viewed  as  a  shorthand  for  the 
equation  Q  =  c,  for  some  constant  c.  The  equation  M-{Qi,Q2)  is  a  qualitative 
equation  representing  the  functional  dependence  of  on  Q2,  find  the  fact  that  if  Q2 
increases  then  Qi  decreases  [Kuipers,  1986). 

2.2  Multiple  models 

Any  device  can  be  modeled  in  many  different  ways,  i.e.,  it  can  be  described  by  different 
sets  of  equations.  Different  device  models  differ  because  they  give  different  answers 
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Container  of  water 
Figure  2.1:  A  temperature  gauge 

to  the  following  two  fundamental  questions: 

•  What  must  he  modeled?  Different  models  can  differ  because  they  choose  to 
model  different  physical  phenomena.  For  example,  the  physical  phenomena 
modeled  in  Figure  2.2  include  the  heat  generated  due  to  current  flow  in  the  wire, 
but  not  the  electromagnetic  field  generated  by  the  same  current  flow.  Models 
can  also  differ  because  they  choose  different  granularities,  i.e.,  they  choose  a 
different  set  of  objects  to  model.  For  example,  the  model  in  Figure  2.2  chose 
a  granularity  that  includes  the  bimetallic  strip  as  a  single  object.  A  different 
model  might  have  chosen  a  different  granularity,  such  as  one  that  separately 
modeled  the  two  strips  of  the  bimetallic  strip. 

•  How  must  the  chosen  things  be  modeled?  Even  though  models  may  choose  to 
model  the  same  phenomena  at  the  same  level  of  granularity,  they  m'xy  differ 
based  on  the  specific  models  they  choose.  For  example,  the  model  in  Figure  2.2 
models  electrical  conduction  in  the  wire  as  a  constant  resistance  resistor.  How¬ 
ever,  other  models  could  have  chosen  to  use  different  models  of  electrical  con¬ 
duction,  e.g.,  by  modeling  the  the  wire  as  an  ideal  conductor,  or  as  a  resistor 
whose  resistance  depends  on  its  temperature.  Similarly,  the  model  in  Figure  2.2 
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Op  =  kiXb 

Xb  =  hTb 

fba  =  k3(Tb  —  Ta) 

/u*  =  k4(T^  —  Tb) 

exogenou${Ta) 

fba  —  fwb 

fwb  ~  fw 

—  txifR^ 

exogenous{Rrv) 
fw  “■ 

Vt  =  itR, 

M-{Rt,T,) 

exogenou${Vv) 

K,  =  K,  +  Vt 


iv  —  if 

Zf  — 

exogenous{Tt) 


Op-.  Pointer  angle 
R^:  Wire  resistance 
it'.  Thermistor  current 
iw'.  Wire  current 
it,:  Battery  current 
Tb'.  Bms  temperature 
Tyj'.  Wire  temperature 
fba'.  Heat  flow  (bms  to  atm) 
fw'.  Heat  generated  in  wire 


Xb'.  Bms  deflection 
Rt'.  Thermistor  resistance 
Vt'.  Thermistor  voltage 
Vyj'.  Wire  voltage 
V„:  Battery  voltage 
Ta'.  Atm  temperature 
Tt'.  Thermistor  temperature 
fwb'.  Heat  flow  (wire  to  bms) 
kj'.  Exogenous  constants 


Figure  2.2:  A  set  of 


equations  describing  the  temperature  gauge 
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uses  an  equilibrium  model  for  the  temperature  of  the  wire,  while  other  mod¬ 
els  might  choose  to  use  a  differential  equation  model  to  model  the  transient 
behavior  of  the  wire’s  temperature. 


2.3  Model  fragments 

Since  a  device  can  be  modeled  in  a  variety  of  different  ways,  it  is  important  that  we 
be  able  to  represent  the  space  of  possible  models  of  the  device.  We  represent  this 
space  using  model  fragments. 

2.3.1  What  is  a  model  fragment? 

A  model  fragment  is  a  set  of  independent  equations  that  partially  describe  some 
physical  phenomena  at  some  level  of  granularity.  Different  model  fragments  can 
describe  different  phenomena,  or  can  be  different  descriptions  of  the  same  phenomena. 
For  example.  Figure  2.3  shows  a  model  fragment  that  describes  electrical  conduction 
in  a  wire  by  modeling  the  wire  as  a  resistor.  Figure  2.4  shows  a  different  model 
fragment  that  describes  the  same  phenomena  for  the  wire  by  modeling  the  wire  as 
an  ideal  conductor.  Finally,  Figure  2.5  shows  a  model  fragment  that  describes  the 
temperature  dependence  of  the  wire’s  length,  a  completely  different  phenomena. 

In  general,  model  fragments  are  only  partial  descriptions  of  phenomena.  For 
example,  the  model  fragment  in  Figure  2.3  only  specifies  the  relation  between  the 
voltage  (K,)  and  the  current  it  does  not  say  anything  about  the  variation  of  the 
resistance  of  the  wire.  Additional  model  fragments  describing  the  resistor’s  resistance 
are  necessary  to  complete  this  description. 

Model  fragments  can  be  viewed  ^ls  either  component  model  instances  [de  Kleer 
and  Brown,  1984;  Williams,  1984],  or  process  instances  [Forbus,  1984].  Component 
model  instances  and  process  instances  usually  have  applicability  conditions  (e.g.,  op¬ 
erating  conditions  [de  Kleer  and  Brown,  1984;  William'^,  1984]  or  quantity  condi¬ 
tions  [Forbus,  1984]),  that  determine  when  the  equations  can  be  used.  There  are 
well  developed  techniques  for  handling  such  applicability  conditions  [Forbus,  1990; 
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Figure  2.3:  Model  fragment  describing  a  wire  as  a  resistor. 

{K,  =  0} 

Figure  2.4:  Model  fragment  describing  a  wire  as  an  ideal  conductor. 
{Iw  =  iwo{^  '■  OlwiTw  —  T^))} 


Figure  2.5:  Model  fragment  describing  the  temperature  dependence  of  the  wire’s 
length. 

Crawford  et  al.,  1990;  Iwasaki  and  Low,  1991].  Hence,  in  this  thesis,  rather  than 
explicitly  modeling  and  reasoning  about  these  applicability  conditions,  we  assume 
that  the  only  model  fragments  under  consideration  are  the  ones  whose  applicability 
conditions  are  satisfied. 


2.3.2  Advantages  of  model  fragments 

A  device  model  is  constructed  by  composing  a  set  of  model  fragments,  i.e.,  rather 
than  viewing  a  model  just  as  a  set  of  equations,  it  is  much  more  useful  to  think  of  it 
as  a  set  of  model  fragments.  Hence,  we  have  the  following  alternative  definition  of  a 
model: 

•  A  model  is  a  set  of  model  fragments  that  describe  some  set  of  phenomena  at 
some  level  of  detail. 

This  viewpoint  has  a  number  of  advantages.  First,  because  model  fragments  are 
partial  descriptions  of  a  single  phenomena,  they  usually  consist  of  a  small  number  of 
equations.  Hence,  constructing  a  library  of  model  fragments  is  relatively  easy.  On  the 
other  hand,  device  models  usually  consist  of  a  large  number  of  equations,  sometimes 
as  many  as  hundreds  of  equations,  because  they  are  complete  descriptions  of  a  number 
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of  phenomena.  Hence,  constructing  a  model  is  much  more  difficult  than  constructing 
a  model  fragment. 

Second,  a  set  of  model  fragments  is  an  implicit  representation  of  a  very  large  set 
of  models.  This  is  because  any  subset  of  this  set  of  model  fragments  can  be  composed 
to  form  a  model. ^  Hence,  a  set  of  model  fragments  is  an  implicit  representation  of 
an  exponentially  large  set  of  models.  Alternate  representations  of  this  large  space 
of  models,  by  explicitly  representing  each  model,  are  unrealistic.  To  put  it  another 
way,  explicitly  representing  the  space  of  possible  models  restricts  us  to  representing 
a  much  smaller  set  of  models. 

Third,  model  fragments  are  reusable,  not  just  in  different  models  of  the  same 
device,  but  in  different  models  of  different  devices.  For  example,  the  model  fragments 
shown  in  Figure  2. 3-2. 5  can  be  reused,  not  only  in  a  number  of  different  models  of 
the  temperature  gauge  shown  in  Figure  2.1,  but  also  in  models  of  other  devices  that 
use  wires.  This  means  that  the  effort  of  constructing  a  library  of  model  fragments 
can  be  amortized  over  their  use  in  a  variety  of  different  models. 

2.3.3  Composing  model  fragments 

The  equations  of  a  device  model  are  created  by  composing  the  equations  of  the  model 
fragments  used  to  construct  the  model.  In  most  cases,  the  composition  is  a  straight¬ 
forward  union  of  the  equations  in  the  model  fragments.  However,  because  model 
fragments  are  partial  descriptions  of  phenomena,  there  is  a  need  to  have  special  types 
of  expressions  that  provide  only  partial  information  about  equations.  Such  partial 
descriptions  have  associated  with  them  a  set  of  composition  rules  that  are  used  to 
combine  different  partial  descriptions  to  create  a  complete  equation  in  the  model. 

Consider,  for  example,  a  bathtub  partially  filled  with  water.  Suppose  that  a  tap 
has  been  turned  on  to  fill  up  the  bathtub.  Simultaneously,  suppose  that  the  drain 
plug  in  the  bathtub  ha^  been  opened  to  try  and  empty  the  bathtub.  The  net  effect 
of  these  two  water  flows  (i.e.,  from  the  tap  into  the  bathtub,  and  out  of  the  bathtub 

*As  we  .shall  see  later,  not  every  subset  of  model  fragments  can  be  viewed  as  a  model,  but  the 
basic  observation  still  holds. 
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through  the  drain)  can  be  described  by  the  equation: 


dVtub 

dt 


—  ftap  fd 


rain 


where  is  the  volume  of  water  in  the  tub,  ftap  is  the  rate  at  which  water  enters 
the  tub  through  the  tap,  and  /drain  is  the  rate  at  w'hich  water  leaves  the  tub  through 
the  drain.  For  maximum  flexibility,  it  is  useful  to  describe  the  two  water  flows  using 
separate  model  fragments.  What  might  the  equations  of  these  model  fragments  be? 
Intuitively,  the  model  fragment  describing  the  tap  water  flow  must  say  that  the  water 
flowing  through  the  tap  tends  to  increase  the  volume  of  water  in  the  bathtub.  Sim¬ 
ilarly,  the  model  fragment  describing  the  drain  water  flow  must  say  that  the  water 
flowing  out  of  the  drain  tends  to  decrease  the  volume  of  water  in  the  bathtub. 

We  can  express  this  using  the  7+  and  7—  operators  introduced  by  Forbus  [Forbus, 
1984].  7-f  (91,92)  says  that  92  is  a  positive  influence  on  91,  while  7— {91,92)  says  that 
92  is  a  negative  influence  on  91.  Given  a  set  of  influences  on  a  parameter  9,  we  use 
the  closed  world  assumption  that  these  are  the  only  influences  on  9  to  construct  an 
equation.  For  example,  the  model  fragment  describing  the  tap  water  flow  would  have 
the  equation  I+(Vtub,  ftap),  and  the  model  fragment  describing  the  drain  water  flow 
would  have  the  equation  I—{Vtab,  /drain]  Combining  these  two  model  fragments,  and 
assuming  that  these  are  the  only  influences  on  Vtab,  we  get  the  equation 


dVtnb 

dt 


—  /tap  fd 


rain 


The  use  of  composable  operators  like  7-f  and  7—  are  crucial  to  our  use  of  model 
fragments  as  partial  descriptions  of  phenomena.  Table  2.1  shows  the  composable 
operators  that  we  use.  Brief  descriptions  have  been  included  in  this  table,  and  more 
detailed  descriptions  are  provided  in  Appendix  C. 


2.3.4  Relations  between  model  fragments 

We  now  turn  to  a  discussion  of  some  important  relations  between  model  fragments: 
contradictory  and  approximation.  We  also  introduce  assumption  classes,  and  the 
required  assumption  classes  of  a  model  fragment. 
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lA 

Positive  influence  on  derivatives 

I- 

Negative  influence  on  derivatives 

sum-term 

Term  in  a  sum 

sum-to-zero 

Quantities  that  add  up  to  zero 
(Used  for  Kirchhoff’s  current  law) 

same-value 

Quantities  are  equal 

(Used  for  Kirchhoff’s  voltage  law) 

same-reference 

Quantities  have  a  common  reference  (potential) 
(Used  for  Kirchhoff’s  voltage  law) 

same-circuit 

Flows  belong  to  the  same  circuit 
(Used  for  Kirchhoff’s  current  law) 

Table  2.1:  Composable  operators 
The  contradictory  relation 

As  mentioned  earlier,  different  model  fragments  can  be  descriptions  of  different  phe¬ 
nomena,  or  can  be  different  descriptions  of  the  same  phenomena.  When  model  frag¬ 
ments  describe  the  same  phenomena,  they  often  mahe  contradictory  assumptions 
about  the  domain.  For  example  Figure  2.6  shows  three  different  model  fragments 
describing  electrical  conduction  in  a  wire,  which  make  contradictory  assumptions.  In 
particular,  the  ideal  conductor  model  fragment  assumes  that  the  resistance  of  the  con¬ 
ductor  is  zero,  the  ideal  insulator  model  fragment  assumes  that  the  resistance  of  the 
conductor  is  infinite,  while  the  resistor  model  fragment  assumes  that  the  resistance 
of  the  conductor  is  non-zero  and  finite. 

Ideal-conductor (wire- 1) :  Ku  =  0 
Ideal-insulator(wire-l)  :  =  0 

Resistor(wire-l)  :  K,  = 

Figure  2.6:  Model  fragments  describing  electrical  conduction  in  a  wire. 

We  represent  the  fact  that  model  fragments  make  contradictory  assumptions  about 
the  domain  using  the  contradictory  relation.  If  mj  and  m2  are  model  fragments,  then 
contradictory  {mi,  m2)  says  that  mi  and  m2  make  contradictory  assumptions  about 
the  domain.  It  is  important  to  note  that  the  contradictory  relation  is  a  primitive. 
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domain-dependent  relation  which  cannot,  in  general,  be  derived  from  the  equations  of 
the  model  fragments.  For  example,  there  is  nothing  intrinsically  contradictory  about 
the  equations  of  the  ideal  conductor  model  fragment  and  the  ideal  insulator  model 
fragment,  i.e.,  it  is  certainly  possible  that  both  the  current  through  a  conductor  and 
the  voltage  drop  across  the  conductor  is  zero.  The  contradiction  between  them  is  a 
domain  fact.  However,  we  assume  that  the  contradictory  relation  is  irreflexive  (so 
that  model  fragments  cannot  contradict  themselves),  and  symmetric  (so  that  model 
fragments  can  only  be  mutually  contradictory): 

contradictory  {nii,  ini)  (2.1) 

contradictory{mi,m2)  =»  contradictory  {m2,  ru])  (2.2) 

The  approximation  relation 

As  discussed  above,  when  two  model  fragments  describe  the  same  phenomenon,  they 
often  make  contradictory  assumptions  about  the  domain.  In  addition  to  specifying 
that  model  fragments  contradict  each  other,  an  engineer  may  be  able  to  specify  that 
one  model  fragment  is  a  more  approximate  description  of  the  phenomenon  than  the 
other.  This  means  that  the  predictions  made  by  the  more  accurate  model  fragment 
are  “closer  to  reality”  than  the  predictions  made  by  the  more  approximate  model 
fragment.  We  represent  such  knowledge  using  the  approximation  relation  between 
model  fragments.  In  particular,  approximation{m-i,m2)  says  that  the  model  fragment 
:n2  is  a  more  approximate  description  of  some  phenomena  than  the  model  fragment 
rri].  For  example,  Figure  2.7  shows  some  of  the  approximation  relations  between  the 
model  fragments  shown  in  Figure  2.6. 

approximation{ResistoT(viTe-l),  Ideal-conductor  (wire-1)) 
approximation{ResistoT(viTe-l),  Idei»l-insuiator(wi,re-l)) 

Figure  2.7;  Approximation  relation  between  the  electrical  conduction  model  frag¬ 
ments. 

Once  again,  it  is  important  to  note  that  the  approximation  relation  is  a  primitive, 
domain-dependent  relation,  and  this  relation  cannot,  in  general,  be  derived  from  the 
equations  of  the  model  fragments.  For  example,  there  is  nothing  about  the  equations 
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of  the  ideal  conductor  model  fragment  that  tells  us  that  it  is  necessarily  a  more  ap¬ 
proximate  description  of  electrical  conduction  than  the  resistor  model  fragment;  this 
just  happens  to  be  a  domain  fact  discovered  by  scientists  and  engineers.  However,  we 
require  that  the  approximation  relation  be  irreflexive,  anti-symmetric,  and  transitive 
(so  that  model  fragments  are  not  approximations  of  themselves,  and  approximation 
forms  a  partial  ordering  on  the  relative  accuracy  of  the  model  fragments  describing  a 
phenomena): 


->approximation{miymi)  (2.3) 

approximation{mi,m2)  =»  -'approximation{m2,mi)  (2.4) 

approximation{mi,m2)  A  approximation{m2,m2)  approximation{m\,mz\2.^) 

Furthermore,  since  approximations  make  different,  and  hence  contradictory,  predic¬ 
tions  ab  >ut  the  same  phenomenon,  we  require  that  all  approximations  are  also  mu¬ 
tually  contradictory: 

approximation{mi,m2)  contradictory{mi,m2)  (2.6) 

Assumption  classes 

An  assumption  class  is  a  set  of  model  fragments  that  make  different,  contradictory 
assumptions  about  the  domain.  This  means  that  an  assumption  class  is  a  set  of 
mutually  contradictory  model  fragments,  i.e.,  if  mj  and  m2  are  model  fragments,  and 
A  is  an  assumption  class,  we  have: 

(mi,m2€A)  A  mi  m2  contradictory  {mi,  m2)  (2.7) 

One  can  see  that  the  model  fragments  in  Figure  2.6  form  an  assumption  class  describ¬ 
ing  electrical  conduction  in  the  wire.  Figure  2.8  shows  two  model  fragments  forming 
an  assumption  class  describing  the  resistance  of  a  wire. 

Recall  that  model  fragments  are  partial  descriptions  of  phenomena.  Additional 
model  fragments  are  required  to  complete  this  description.  We  represent  the  set  of 
model  fragments  that  can  be  used  to  complete  a  description  by  associating  with  each 
model  fragment  a  set  of  required  assumption  classes.  Let  A  be  an  assumption  class 
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Constant-resistance (wire-1) :  exogenous{R^) 
Temperature-dependent-resistance  (wire- 1)  :  =  R^^{1  +  a^{T^  -  T^)) 

exogenous{Ryjo) 

exogenous{a^u) 

exogenou${T^) 

Figure  2.8:  Model  fragments  describing  a  wire’s  resistance. 

required  by  model  fragment  m  (written  reqmres{m,  A)).  This  means  that  to  complete 
the  description  of  the  phenomena  described  by  m,  we  must  include  a  model  fragment 
from  the  assumption  class  A.  For  example,  to  complete  the  description  of  electrical 
conduction  described  by  the  resistor  model  fragment,  we  require  a  description  of  the 
resistance,  i.e.,  the  Resistor(wire-l)  model  fragment  requires  a  model  fragment 
from  the  assumption  class  shown  in  Figure  2.8. 


2.4  Space  of  possible  models 

In  the  previous  section,  we  have  argued  that  a  set  of  applicable  model  fragments  form 
a  compact  representation  of  a  very  large  space  of  possible  models.  In  this  section  we 
discuss  the  following  issue;  given  a  device  description,  how  do  we  decide  which  model 
fragments  are  applicable.  Our  answer  to  this  issue  can  be  summarized  a.s  follows: 

•  The  set  of  applicable  model  fragments  is  the  union  of  the  model  fragments 
associated  with  the  components  of  the  device. 

We  now  discuss  this  in  detail. 

2.4.1  Device  structure 

The  structure  of  a  device  is  a  description  of  the  device  which  specifies  the  compo¬ 
nents,  or  parts,  of  the  device,  physical  properties  of  these  components,  and  how  these 
components  are  put  together  to  form  the  device. 

The  components  that  can  be  used  to  describe  the  structure  of  a  device  are  drawn 
from  a  library  of  component  types.  For  example,  to  define  the  structure  of  the  tem¬ 
perature  gauge  shown  in  Figure  2.1,  the  component  library  must  contain  component 
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types  like  thermistor,  wire,  battery,  bimetallic-'^trip,  and  pointer. 

Components  are  put  together  to  form  a  device  description  with  the  use  of  struc¬ 
tural  relations.  These  relations  are  drawn  from  a  library  of  structural  relations,  and 
include  relations  such  as  connected-to  (indicating  that  two  component  terminals  are 
connected),  coiled-around  (indicating  that  a  wire  is  coiled  around  a  component), 
and  meshed  (indicating  that  a  pair  of  gears  mesh  with  each  other).  Figure  2.9  shows 
a  structural  description  of  the  temperature  gauge  in  Figure  2.1. 


2.4.2  Structural  abstractions 

The  structure  of  a  device  specifies  the  basic  set  of  components  in  the  device.  This 
basic  set  of  components  can  be  augmented  by  recognizing  structural  abstractions. 
Structural  abstractions  are  components  that  represent  a  set  of  other  components  in 
specific  structural  configurations.  For  example,  components  of  the  Coil-structure 
component  type  represent  objects  corresponding  to  a  wire  coiled  around  another 
object. 

The  component  library  contains  rules  that  can  be  used  to  recognize  instances  of 
a  structural  abstraction  in  the  structural  description  of  a  device.  For  example,  the 
following  rule  is  used  to  recognize  Coil-structures: 

(implies 

(and  (Wire  ?object) 

(coiled-around  ?object  ?core)) 

(exists 

?struc  Coil- structure 

(and  (coil-structure-wire  ?struc  ?object) 
(coil-structure-core  ?struc  ?core)))) 

Therefore,  the  set  of  all  components  of  a  device  consist  of  the  union  of  the  set  of 
basic  components  specified  in  the  structural  description,  and  the  set  of  all  structural 
abstractions  that  can  be  recognized  using  the  rules  in  the  component  library.  For 
example,  applying  the  above  rule  to  the  device  structure  shown  in  Figure  2.9,  we  see 
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(def device  bimetallic- strip-temperature-gauge 
((?V  Battery) 

(?T  Thermistor)  (?W  Wire) 

(?B  Bimetallic-strip) 

(?P  Pointer) 

(?L  Linkage) 

(?ATM  Atmosphere) 

(?A1  Axis) 

(?A2  Axis)) 

(connected-to  (battery-terminal-one  ?V) 
(wire-terminal-one  ?W)) 
(connected-to  (battery-terminal-two  ?V) 

(thermistor-terminal-one  ?T) ) 
(connected-to  (thermistor-terminal-two  ?T) 
(wire-terminal-two  ?W)) 
(connected-to  (bms-terminal-two  ?b) 

(linkage-terminal-one  ?L)) 
(connected-to  (pointer-terminal-two  ?P) 
(linkage-terminal-one  ?L)) 
(coiled-around  ?W  ?B) 

(immersed- in  ?B  ?ATM) 

(immersed-in  ?P  ?ATM) 

(immersed- in  ?V  ?ATM) 

(immersed-in  ?L  ?ATM) 

(fixed-object  (bms-terminal-one  ?B)) 
(can-rotate  ?P  ?A2) 

(bms-def ormation-zucis  ?B  ?A1)) 

Figure  2.9:  Structural  description  of  the  temperature  gauge. 
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that  an  instance  of  the  Coil-stmcture  is  recognized,  corresponding  to  the  wire,  ?W, 
being  coiled  around  the  bimetallic  strip,  ?B. 

2.4.3  Possible  models  of  a  component 

The  space  of  model  fragments  that  can  be  used  to  construct  a  device  model  is  defined 
by  associating  with  each  component  of  the  device,  whether  a  basic  component  or  a 
structural  abstraction,  a  set  of  model  fragments  that  can  be  used  to  describe  that 
component.  For  example,  we  could  associate  with  a  wire,  wire-1,  the  following  model 
fragment  describing  electrical  conduction  in  the  wire: 

{K;  =  iwRw} 

As  discussed  earlier,  model  fragments  can  be  viewed  either  as  “component  models”  [de 
Kleer  and  Brown,  1984;  Williams,  1984]  or  “process  models”  [Forbus,  1984].  Hence,  a 
model  fragment  associated  with  a  component  is  a  partial  description  of  some  physical 
phenomena,  including  som-j  physical  process,  occurring  in  that  component.  It  is  worth 
noting  that  model  fragments  associated  with  structural  abstractions  can  be  used  to 
represent  physical  processes  that  take  place  over  more  than  one  basic  component. 
For  example,  if  csl  is  a  structural  abstraction  representing  the  wire  coiled  around 
the  bimetallic  strip,  then  we  could  aissociate  with  it  the  following  model  fragment, 
describing  heat  flow  from  the  wire  to  the  bimetallic  strip: 

{fcsl  =  lcsl{Twl  —  Tfci)} 

where  /„!  is  the  heat  flow,  7c,i  is  the  thermal  conductance,  is  the  temperature 
of  the  wire,  and  Tj,!  is  the  tempera<-ure  of  the  bimetallic  strip. 

In  summary,  the  space  of  possible  models  of  a  device  is  represented  implicitly  by 
the  set  of  applicable  model  fragments  that  can  be  composed  to  form  models  of  the 
device.  The  set  of  applicable  model  fragments  is  the  union  of  the  model  fragments 
associated  with  each  of  the  components  of  the  device.  In  the  next  section  we  discuss 
our  representation  of  the  space  of  model  fragments  that  can  be  used  to  describe  a 
component. 
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2.5  Model  fragment  classes 

Thus  far  we  have  been  talking  about  components  and  model  fragments  as  instance 
level  descriptions,  i.e.,  a  component  is  a  specific  component  used  in  a  specific  device, 
and  a  model  fragment  is  the  specific  set  of  equations  describing  some  physical  phe¬ 
nomenon  in  a  specific  component.  However,  building  a  library  of  components  and 
model  fragments  requires  that  we  provide  class  level  descriptions,  i.e.,  descriptions  of 
classes  of  components  and  model  fragments  that  can  be  instantiated  to  create  struc¬ 
tural  descriptions  and  models  for  a  variety  of  devices.  To  this  end,  we  have  devised 
an  implemented  language  for  specifying  class  level  descriptions  of  components  and 
model  fragments.  We  now  describe  this  language,  and  show  how  we  represent  the 
information  described  above. 

2.5.1  What  are  component  and  model  fragment  classes? 

Component  and  model  fragment  classes  are  just  cleisses,  where  a  class  is  viewed  as 
a  set  of  instances.  Component  classes  are  class  level  descriptions  of  components: 
components  are  just  instances  of  the  corresponding  component  classes.  Model  frag¬ 
ment  classes  are  class  level  descriptions  of  phenomena.  A  component  is  modeled  by 
a  particular  model  fragment  class  by  making  the  component  an  instance  of  the  class. 

Following  [Hayes,  1979],  classes  can  be  viewed  as  unajy  predicates  that  are  true 
of  their  instances.  Functions  and  higher  arity  predicates  "re  implemented  as  slots  on 
instances,  hor  example,  if  s  is  a  binary  predicate,  and  u  and  v  aie  instances,  then  the 
literal  s(u,v)  is  represented  by  placing  the  instance  v  on  the  s  slot  of  the  instance  u. 

Component  and  model  fragment  classes  inherit  various  properties  to  their  in¬ 
stances.  The  most  important  property  that  a  model  fragment  class  inherits  to  its 
instances  is  the  equations  describing  the  phenomena.  These  inherited  equations  form 
the  model  fragment  describing  the  physical  phenomena  for  that  instance. 

2.5.2  Typographic  conventions 

A  few  notes  on  typographic  conventions. 
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•  Names  of  components,  model  fragments,  component  classes,  and  model  frag¬ 
ment  classes  will  be  typeset  in  typewriter  font. 

•  Class  names  begin  with  an  uppercase  letter,  while  slot  names  and  instance 
names  begin  with  a  lower  case  letter. 

•  If  M  is  a  model  fragment  class,  and  c  is  a  component,  then  M(c)  denotes  the 
model  fragment  resulting  from  modeling  c  as  an  instance  of  M.  M(c)  will  also 
be  used  to  represent  the  ground  literal  expressing  the  fact  that  c  is  an  instance 
of  M.  It  will  always  be  clear  from  the  context  whether  M(c)  represents  a  model 
fragment  or  a  literal. 

•  Instances  of  component  classes  will  often  have  names  formed  by  concatenating 
the  name  of  the  component  class  with  a  number. 

•  Variables  names  will  start  with  the  “?”  character.  The  variable  “Tobject” 
used  in  class  definitions  is  bound  to  the  class  instance  under  consideration. 

To  illustrate  some  of  the  above  conventions,  let  Wire  be  a  component  class  repre¬ 
senting  the  set  of  all  wires,  and  let  Resistor  be  a  model  fragment  class  representing 
the  set  of  all  resistors.  Let  wire-1  be  an  instance  of  Wire.  To  model  wire-1  as  a 
resistor,  we  would  make  it  an  instance  of  Resistor,  with  the  corresponding  model 
fragment  being  Resistor(wire-l).  Note  that,  since  wire-1  is  now  an  instance  of 
both  Hire  and  Resistor,  the  literals  Wire(wire-l)  and  Resistor(wire-l)  are  both 
true. 

2.5.3  Defining  component  and  model  fragment  classes 

Component  and  model  fragment  classes  are  defined  using  the  defmodel  macro.  Fig¬ 
ure  2.10  shows  the  definition  of  the  Resistor  model  fragment  class.  We  now  discuss 
various  parts  of  this  definition. 
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(defmodel  Resistor  (Electrical-conductor) 

( (attributes 
(resistance 

: range  Resistance-parameter 
: documentation  "The  resistor’s  resistance")) 
(equations 

(=  (voltage-difference  ?object) 

(.*  (resistance  ?object) 

(current  (electrical-terminal-cnf.  Tob-^ect))))) 
(assumption-class  electricad-conductor-'r.lass) 
(approximations  Ideal-conductor) 
(required-assumption-classes  resist zince-class) 
(possible-models  Constant-resistance 

Temperature-dependent-resistance) ) ) 


Figure  2.10:  The  Rt  sistor  model  fragment  class. 

Generalization  hierarchy 

Component  and  model  fragment  classes  are  organized  into  a  generrjization  hierarchy, 
representing  the  “subset-of”  relation  between  classes.  The  use  of  a  ^i^  iiCralization  hier¬ 
archy,  in  conjunction  with  inheritance,  is  a  very  powerful  tool  for  building  knowledge 
bases  because  it  facilitates  reuse  and  knowledge  base  maintenance  (a)  knowledge 
represented  with  a  class  can  be  used,  not  just  by  direct  instances  jf  the  class,  but 
also  by  instances  of  many  different  classes  that  are  subclasses  (specializations)  of  the 
class;  and  (b)  since  knowledge  needs  to  be  represented  only  with  the  most  general 
class  to  which  the  kncwledge  is  applicable,  knowledge  base  maintenance  is  facilitated 
since  most  changes  tend  to  be  localized. 

The  second  argument  to  the  defmodel  macro  specifies  the  list  of  classes  that  are 
immediate  generalizations  of  the  defined  class.  Hence,  the  Electrical-conductor 
class  is  an  immediate  generalization  of  the  Resistor  class.  Logically  this  is  equivalent 
to  the  following  axiom: 

Resistor(?object)  =»  Electrical-conductor (?object) 

From  the  point  of  view  of  model  fragments  used  in  a  model,  this  mear-^  that  any 
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model  that  includes  the  model  fragment  Resistor(?object),  also  includes  the  model 
fragment  Electrical-conductor  (?ob j  ect ) . 

Parameters  and  other  attributes 

Parameters  are  represented  as  instances  of  a  subclass  of  the  Parameter  class.  For 
example,  parameters  representing  voltages  are  instances  of  the  Voltage-parameter 
class,  while  parameters  representing  resistances  are  instances  of  the  Resistance-pa¬ 
rameter  class.  Both  Voltage-parameter  and  Resistance-parameter  are  subclasses 
of  Parameter. 

Recall  that  parameters  represent  numerical  attributes  of  a  device,  in  particular,  of 
components.  The  relationship  between  a  component  and  a  parameter  that  represents 
a  particular  attribute  of  the  component  is  represented  by  unary  functions,  called 
parameter  functions .  For  example  voltage-difference  is  a  parameter  function  that 
returns  the  instance  of  Voltage-parameter  which  represents  the  voltage  difference 
across  a  component  being  modeled  as  an  Electrical-conductor. 

The  attributes  clause  in  the  definitions  of  model  fragment  classes  defines  the 
parameter  functions  that  can  be  used  on  components  being  modeled  by  that  model 
fragment  class.  The  definition  of  the  parameter  function  includes  a  :  range  specifica¬ 
tion,  which  is  the  class  of  the  parameter  returoi;d  by  the  function.  For  example,  the 
Resistor  model  fragment  class  defines  the  resistance  parameter  function,  whicn 
returns  an  instance  of  Resistance-parameter  representing  the  resistance  of  compo¬ 
nents  being  modeled  as  Resistors. 

The  attributes  clause  is  also  us^  t  to  define  functions  that  return  other  at¬ 
tributes  of  components.  For  example,  two  important  attributes  of  an  electrical 
conductor  are  the  two  terminals  of  the  conductor.  (Conceptually,  terminals  are 
parts  of  the  component  that  allow  the  component  to  interact  with  other  compo¬ 
nents  by  sharing  parameters  [de  Kleer  and  Brown,  1984].)  Figure  2.11  shows  the 
definition  of  the  Two-terminal-electrical-component  model  fragment  class.  The 
attributes  clause  in  this  definition  defines  the  functions  electrical-terminal-one 
and  electricad-terminal -two,  which  return  the  two  Electrical. -terminals  of  the 
electrical  component. 
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(defmodel  Two-terminal-electrical-component  (Model -fragment) 
((attributes 

(electrical-terminal-one 
: range  Electrical -terminal 

: documentation  "One  end  of  the  jtiical  component") 
(electrical -terminal-two 
: range  Electrical -terminal 

.‘documentation  "The  other  end  of  the  electrical  component")) 
(equations 

(=  (current  (electrical-terminal-one  ?object)) 

(current  (electrical-terminal -two  Tobject))) 

(same-circuit  (current  (electrical-terminal-one  ?object)) 
(current  (electrical -terminal-two  ?object))) 
(same-reference  (voltage  (electrical-terminal-one  ?object)) 

(voltage  (electrical -terminal-two  ?object)))))) 


Figure  2.11:  The  Two-terminal-electrical-component  model  fragment  class. 

The  attributes  that  a  component  inherits  from  a  model  fragment  class  are  often 
related  to  attributes  that  it  inherits  from  a  component  class.  For  example,  in  mod¬ 
eling  a  wire  as  an  electrical  conductor  between  its  two  ends,  the  two  terminals  of 
the  electrical  conductor  correspond  to  the  two  ends  of  the  wire.  We  enforce  such 
relationships  using  a  set  of  rules,  which  are  similar  to  articulation  axioms  in  [Hobbs, 
1985] . 

Equations 

The  equations  that  a  model  fragment  clciss  inherits  to  its  instances  are  defined  using 
the  equations  clause.  These  equations  are  defined  using  equation  schemas.  Equation 
schemas  are  exactly  like  equations,  except  that  parameters  are  replaced  by  terms  like 
(resistance  ?object).  To  instantiate  such  equation  schemas  for  specific  instances 
of  the  model  fragment  class,  the  variable  “Tobject”  is  bound  to  the  instance,  and  the 
terms  are  replaced  by  the  parameter  resulting  from  evaluating  the  term.  For  exam¬ 
ple,  if  resistance(wire-l)  =  resistance-pairameter-l,  then  evaluating  the  term 
(resistance  ?object)  for  the  instance  wire-1  results  in  resistemce-parameter-l 
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Hence,  if  wire-1  is  modeled  as  a  Resistor,  then  wire-1  inherits  the  equation: 

(=  voltage-parameter-1  (*  resistzince-parameter-l  current-parameter-1)) 

Assumption  classes 

The  assumption-class  clause  in  a  model  fragment  class  specifies  the  assumption 
class  of  the  model  fragments  which  are  instances  of  the  model  fragment  class.  More 
precisely,  let  c  be  a  component  and  let  Ml  and  M2  be  model  fragment  classes.  The 
model  fragments  Ml(c)  and  M2(c)  are  in  the  same  assumption  class  if  and  only  if  the 
as  sumption- cl  ass  clause  in  both  Ml  and  M2  specify  the  same  assumption  class.  Let 
both  Ml  and  M2  specify  A  in  their  assumption-class  clause.  We  let  the  expression 
A(c)  denote  the  assumption  class  of  the  model  fragments  Ml(c)  and  M2(c).^  Fur¬ 
thermore,  we  will  sometimes  say  “the  assumption  class  of  Ml  is  A,”  meaning  that  for 
any  component  c,  the  model  fragment  Ml(c)  is  in  assumption  class  A(c). 

For  example,  we  can  see  that  the  assumption-class  clause  in  Resistor’s  defini¬ 
tion  specifies  electrical-conductor-class.  Suppose  that  the  assumption-class 
clause  in  Ideal-conductor’s  definition  also  specifies  electrical-conductor-class. 
This  means  that  for  a  component  such  a^  wire-1,  the  model  fragments  Resis- 
tor(wire-l)  and  Ideal-conductor(wire-l)  are  in  the  assumption  class  electri¬ 
cal-conductor-class  (wire-1) . 

Approximations 

The  approximations  clause  in  a  model  fragment  class  specifies  the  model  fragments 
that  axe  approximations  of  instances  of  that  class.  More  precisely,  let  c  be  a  compo¬ 
nent  and  let  Ml  and  M2  be  model  fragment  classes.  The  model  fragment  M2(c)  is  an 
approximation  of  the  model  fragment  Ml(c)  if  and  only  if  the  approximations  clause 
in  Ml  specifies  M2.  For  example,  we  can  see  that  the  approximations  clause  of  the 

^This  is  a  slight  abuse  of  notation.  While  it  is  similar  to  our  convention  that  Ml(c)  denotes  the 
model  fragment  resulting  from  modeling  the  component  c  as  an  instance  of  the  Ml  model  fragment 
class,  it  certainly  does  not  mean  that  A  is  a  unary  predicate  so  that  A(c)  is  a  literal  meaning  that  c 
is  an  instance  of  A.  To  prevent  any  confusion,  we  will  always  refer  to  A(c)  as  “the  assumption  class 
A(c).” 
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Resistor  model  fragment  class  specifies  Ideal-conductor.  This  means  that  for  a 
component  such  as  wire-1,  the  model  fragment  Ideal-conductor  (wire- 1)  is  an  ap¬ 
proximation  of  the  model  fragment  Resistor(wire-l).  To  relate  this  to  terminology 
introduced  in  a  previous  section,  we  have: 

approa:imah'on(Resistor  (wire-1).  Ideal-conductor  (wire-1)) 

Similarly,  the  contradictory  clause  in  a  model  fragment  class  specifies  the  model 
fragments  that  contradict  the  instances  of  that  class.  Figure  2.10  does  not  show  a 
contradictory  clause  because  the  contradiction  between  Resistor  and  Ideal-con¬ 
ductor  can  be  inferred  from  the  approximations  clause. 

Required  assumption  classes 

The  required-assumption-classes  clause  in  a  model  fragment  class  specifies  all 
the  assumption  classes  that  are  required  to  complete  the  description  of  model  frag¬ 
ments  that  are  instances  of  that  model  fragment  class.  More  precisely,  suppose  that 
c  is  a  component  and  M  is  a  model  fragment  class.  Suppose  that  M  specifies  A  as  a 
required-assumption-class.  This  means  that,  to  complete  the  description  spec¬ 
ified  by  the  model  fragment  M(c),  we  must  include  a  model  fragment  from  the  as¬ 
sumption  class  A(c).  For  example,  the  required-assumption-classes  clause  of 
the  Resistor  model  fragment  class  specifies  re sistemce- class.  This  rneems  that 
for  a  component  such  as  wire-1,  the  description  specified  by  the  model  fragment 
Resistor(wire-l)  must  be  completed  by  including  a  model  fragment  from  the  as¬ 
sumption  class  resistance-class(wire-l),  i.e.,  by  including  either  Constant-re- 
sistance(wire-l)  or  Temperature-dependent-resistance(wire-l).  To  relate  it 
to  terminology  introduced  earlier,  we  have: 

reguires(Resistor  (wire-1),  resistance-class  (wire-1)) 

Possible  models 

Recall  that  the  space  of  device  models  was  defined  by  the  set  of  model  fragments 
that  can  be  used  to  describe  the  device.  This  set  of  model  fragments  was  the  union 
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of  the  model  fragments  that  can  be  used  to  describe  the  components  of  the  device. 
This  means  that  we  need  a  representation  of  the  set  of  model  fragments  that  can  be 
used  to  model  a  component.  A  straightforward  way  to  represent  this  set  of  model 
fragments  is  to  associate  with  each  component  class  the  set  of  model  fragment  classes 
which  can  be  used  to  model  instances  of  that  component  class. 

For  example,  some  of  the  model  fragment  classes  we  could  associate  with  the 
Wire  component  class  would  include  Ideal-conductor,  Constamt -resistance,  and 
Temperature-dependent-resistance.  This  would  represent  the  fact  that  any  in¬ 
stance  of  Wire  can  be  modeled  as  an  instance  of  the  associated  model  fragment 
classes. 

While  the  above  approach  is,  in  principle,  correct,  a  much  better  approach  is  to  use 
a  possible  models  hierarchy.  The  baisic  intuition  underlying  this  approach  is  the  ob¬ 
servation  that  model  fragment  classes  like  Ideal-conductor,  Constemt-resistance, 
and  Temperature-dependent-resistamce  are  all  models  of  electrical  conduction. 
Hence,  it  would  be  much  better  if  we  only  had  to  represent  the  fact  that  an  instance 
of  Wire  can  be  modeled  as  an  Electrical-conductor,  with  additional  electrical 
conductor  models  being  associated  with  Electrical-conductor.  Similarly,  rather 
than  associating  all  the  electrical  conductor  models  with  Electrical-conductor, 
we  would  associate  only  Ideal-conductor  and  Resistor  with  it,  and  associate 
Constamt-resistance  and  Temperature-dependent-resistance  with  Resistor. 
In  essence,  we  build  a  hierarchy  of  possible  models. 

The  advantage  of  the  possible  models  hierarchy  are  very  similar  to  the  advantages 
of  a  generalization  hierarchy.  First,  it  leads  to  compact  representations.  For  example, 
one  only  needs  to  specify  that  instances  of  Wire  can  be  modeled  as  Electrical-con¬ 
ductor,  with  additional  ways  of  modeling  instances  of  Wire  being  inferred  from  the 
hierarchy.  Second,  knowledge  base  maintenance  is  simplified.  For  example,  if  we  want 
to  add  an  additional  model  fragment  class  describing  yet  another  electrical  conductor 
model,  e.g.,  the  dependence  of  the  resistance  on  length,  then  this  change  need  only 
be  made  to  the  possible  models  hierarchy  below  Resistor;  definitions  of  component 
classes,  like  Wire,  need  to  be  modified. 

The  possible  models  of  a  model  fragment  class  are  defined  in  tb  „  possible-models 
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clause  of  the  def model  macro.  F-'r  example,  we  can  see  that  the  model  fragment 
classes  that  can  be  used  to  model  instances  of  Resistor  include  Const  ant -resis¬ 
tance  and  Temperature-dependent-resistance. 

Note  that  the  generalization  hierarchy  and  the  possible  models  hierarchy  often 
overlap.  For  example,  Resistor  is  both  a  specialization  and  a  possible  model  of 
Electrical-conductor.  However,  the  two  hierarchies  are  not  the  same.  For  example, 
the  Thermal-thermistor  model  fragment  class,  which  models  the  dependence  of  a 
thermistor’s  resistance  on  its  temperature,  is  a  specialization  of  the  Thermal-object 
model  fragment  class.  However,  it  is  evident  that  not  all  components  being  modeled  as 
Thermal- objects  can  be  modeled  as  Thermal-thermistors,  only  thermistors  can  be 
modeled  as  Thermal-thermistors.  Hence,  Thermal-thermistor  is  a  specialization 
of  Thermal-object,  but  not  a  possible  model  of  it. 


2.5.4  Difference  between  component  and  model  fragment 
classes 

Thus  far  we  have  been  talking  about  component  classes  and  model  fragment  classes 
as  separate  types  of  classes.  But  what  exactly  is  the  difference?  The  answer  is  that, 
fundamentally,  there  is  no  difference!  Both  model  fragment  classes  and  component 
claisses  are  partial  descriptions.  For  example,  while  the  Resistor  model  fragment 
class  is  a  partial  description  of  electrical  conduction,  the  Wire  model  fragment  class 
is  a  partial  description  of  what  it  means  for  an  object  to  be  a  wire. 

The  only  difference  between  component  classes  and  model  fragment  classes  is  their 
position  in  the  possible  models  hierarchy.  Component  classes  are  the  classes  that  are 
at  the  top  of  the  possible  models  hierarchy,  i.e.,  component  classes  are  not  models  of 
any  other  class.  Therefore,  component  claisses  can  be  viewed  as  primitive  descriptions. 
The  decision  to  model  an  object  as  an  instance  of  a  component  class  is,  therefore,  the 
responsibility  of  the  human  user  providing  the  input  (the  structural  description),  and 
is  outside  the  scope  of  the  model  selection  program. 

An  interesting  consequence  of  the  above  observation  is  that  a  human  user  may 
choose  to  define  the  structure  of  the  device  in  terms  of  model  fragment  classes,  rather 
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than  just  component  classes.  For  example,  the  user  may  use  an  instance  of  the 
Electrical-conductor  model  fragment  class  as  part  of  a  device.  The  ability  to 
specify  structural  descriptions  using  model  fragment  classes  provides  the  user  with 
a  valuable  abstraction  tool.  This  is  useful,  for  example,  during  design,  where  the 
designer  may  know  that  there  is  an  electrical  conductor  at  some  place  in  the  device, 
without  knowing  what  specific  component  implements  this  electrical  conductor. 


2.6  Summary 

In  this  chapter  we  defined  a  model  to  be  a  set  of  model  fragments,  where  a  model 
fragment  is  a  set  of  algebraic,  qualitative,  and/or  ordinary  differential  equations, 
describing  some  phenomena  at  some  level  of  detail.  Viewing  a  model  eis  a  set  of 
model  fragments  is  useful  because  model  fragments  are  easier  to  construct  and  more 
reusable  than  complete  models.  In  addition,  the  set  of  applicable  model  fragments 
is  an  implicit  description  of  an  exponentially  large  space  of  possible  models.  The  set 
of  applicable  model  fragments  is  defined  by  the  device  structure  and  a  component 
library.  The  component  library  specifies  the  model  fragments  that  can  be  used  to 
model  each  component  of  the  device. 

We  introduced  two  important  relations  between  model  fragments:  contradictory 
and  approximation.  Model  fragments  related  by  the  contradictory  relation  maJce 
contradictory  assumptions  about  the  domain.  In  addition  to  being  mutually  contra¬ 
dictory,  model  fragments  can  differ  in  the  relative  accuracy  with  which  they  model 
phenomena.  The  relative  accuracy  of  model  fragments  is  represented  using  the  approx¬ 
imation  relation.  In  addition  to  these  two  relations,  we  also  introduced  assumption 
clcisses,  which  are  sets  of  mutually  contradictory  model  fragments  that  describe  the 
same  phenon  nia. 

Finally,  we  concluded  this  chapter  with  a  discussion  of  the  actual  representational 
mechanisms  we  use  to  implement  the  above  ideas.  In  particular,  we  introduced  a 
class  level  representation  of  components  and  model  fragments  and  showed  how  these 

f 

classes  are  organized.  In  this  representation,  a  model  fragment  is  the  result  of  mak¬ 
ing  a  component  an  instance  of  the  corresponding  model  fragment  cla^s.  Component 
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and  model  fragment  classes  are  organized  into  two  hierarchies:  a  generalization  hi¬ 
erarchy  and  a  possible  models  hierarchy.  These  hierarchies  lead  to  more  compact 
representations,  and  facilitate  knowledge  base  maintenance. 


Chapter  3 

Adequate  models 


In  this  chapter  we  discuss  the  adequacy  of  device  models.  The  adequacy  of  a  device 
model  is  fundamentally  determined  by  the  task  that  needs  to  be  solved.  We  will  define 
the  adequacy  of  a  model  with  respect  to  the  task  of  generating  causal  explanations 
for  a  phenomenon  of  interest.  We  also  show  that  additional  constraints  on  model 
adequacy  can  stem  from  the  structure  and  the  behavior  of  the  device.  Finally,  we 
define  model  simplicity  based  on  the  intuition  that  modeling  fewer  phenomena,  more 
approximately,  leads  to  simpler  models.  An  adequate  model  is  required  to  be  as 
simple  as  possible. 


3.1  Tasks  and  models 

The  adequacy  of  a  model  is  closely  tied  to  the  task  for  which  the  model  is  to  be  used. 
Simulations  carried  out  during  the  final  stages  of  the  detailed  design  of  a  device  require 
the  use  of  high  fidelity  models  that  incorporate  accurate,  quantitative  descriptions 
of  all  significant  phenomena.  For  example,  a  high  fidelity  model  of  the  temperature 
gauge  shown  in  Figure  1.1  would  include  a  quantitative,  nonlinear  equation  describing 
the  dependence  of  the  thermistor’s  resistance  on  its  temperature. 

On  the  other  hand,  models  that  support  analysis  during  the  initial,  conceptual 
design  of  a  device  can  be  much  coarser.  For  example,  during  the  conceptual  design 
of  the  temperature  gauge,  it  is  sufficient  to  use  a  qualitative  model  [Bobrow,  1984]  of 
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the  thermistor,  which  states  that  the  thermistor’s  resistance  is  inversely  proportional 
to  its  temperature. 

Similarly,  Hamscher  [Hamscher,  1988,  page  ll]  arg  les  that: 

For  complex  devices  the  model  of  the  target  device  should  be  constructed 
with  the  goal  of  troubleshooting  explicitly  in  mind. 

He  then  presents  a  set  of  representation  and  modeling  principles  that  assist  the  effi¬ 
cient  diagnosis  of  complex  digital  circuits  [Hamscher,  1988;  Hamscher,  1991].  These 
principles  are  an  informal  specification  of  the  adequacy  of  a  model  with  respect  to 
the  task  of  diagnosis. 

In  this  thesis,  we  define  the  adequacy  of  a  model  with  respect  to  the  task  of  gener¬ 
ating  causal  explanations  for  phenomena  of  interest  In  the  next  section  we  discuss  the 
importance  of  this  task,  both  as  a  vehicle  for  communication,  as  well  as  an  important 
subtask  for  other  tasks  such  scs  analysis,  diagnosis,  and  design. 


3.2  Causal  explanations 

Causation  md  causal  reasoning  re  ubiquitous  in  human  reasoning.  People  are  always 
asking  why  something  happened,  expecting  some  sort  of  a  causal  explanation  in  reply. 
However,  while  the  notion  of  caiisation  seems  intuitively  clear  to  everyone,  providing 
a  good  definition  for  it  has  not  been  easy.  Philosophers  have  argued  about  the  true 
nature  of  causation  for  a  long  time  (e.g.,  see  [Mackie,  1974]).  In  this  thesis  we  choose 
not  to  get  mired  in  this  debate.  Instead,  we  take  the  view,  common  in  Artificial 
Intelligence  [Bobrow,  1984;  Iwasaki  and  Simon,  1986b;  Patil  et  al.,  1981;  Pople,  1982; 
Rieger  and  Grinberg,  1977;  Shoham,  1985;  Wallis  and  ShortlifFe,  1982;  Weiss  et  al., 
1978],  that  causal  explanations  are  explanations  of  phenomena  based  on  a  set  of 
underlying  mechanisms,  that  are  assumed  to  provide  a  description  of  how  (the  relevant 
aspect  of)  the  world  really  works,  i.e.,  these  mechanisms  are  assumed  to  be  causal 
mechanisms.  (See  [Nayak,  1989]  for  an  overview  of  this  literature.) 
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3.2.1  Importance  of  causal  explanations 

Causal  explanations  play  an  important  role  in  automated  reasoning  systems  as  a 
vehicle  for  the  system  to  communicate  with  its  human  user.  Such  explanations  can  be 
used  for  instructional  purposes,  as  in  various  Intelligent  Computer  Aided  Instruction 
systems  [Brown  et  a/.,  1982;  Forbus  and  Stevens,  1981;  Weld,  1983],  or  as  a  method 
for  explaining  the  system’s  line  of  reasoning  to  a  human  user  [Patil  et  al.,  1981; 
Weiss  et  al,  1978;  Wallis  and  Shortliffe,  1982]. 

In  addition  to  their  role  in  communication,  causal  explanations  play  a  central  role 
in  focusing  other  forms  of  reasoning  [Weld  and  de  Kleer,  1990].  Causal  explanations 
are  used  in  diagnosis  to  focus  the  reasoning  only  on  those  elements  that  could  have 
caused  a  particular  symptom  [Davis,  1984].  Causal  explanations  focus  design  and 
redesign  by  focusing  the  reasoning  on  just  those  mechanisms  that  can  produce  the 
desired  behavior  [Williams,  1989;  Williams,  1990].  Causal  explanations  can  also  guide 
quantitative  analysis  by  providing  an  overall  structure  for  solving  the  problem  at  hand 
[de  Kleer,  1977]. 


3.2.2  Tj'pes  of  causal  explanations 

Causal  explanations  are  generated  by  stringing  together  causal  relations  of  the  form 
“x  causes  y.”  Different  types  of  causal  explanations  are  generated  depending  on  the 
particular  vocabulary  used  for  modeling  these  causal  relations,  i.e.,  the  types  of  “x” 
and  “y”  and  the  meaning  of  the  “causes”  relation.  In  many  medical  diagnosis  systems 
(e.g.,  CASNET  [Weiss  et  al.,  1978],  CADUCEUS/INTERNIST  [Pople,  1982],  ABEL  [Patil 
et  al.,  1981])  the  causal  relation  relates  different  possible  states  of  a  patient,  while  the 
causal  relation  itself  represents  the  likelihood  of  observing  the  effect  given  the  cause. 
A  similar  approach  is  used  in  Bayesian  networks,  where  the  causal  relation  repre¬ 
sents  conditional  probabilities  between  random  variables  [Pearl,  1988].  Reiger  and 
Grinberg,  in  their  work  on  understanding  phj^sical  mechanisms  [Rieger  and  Grinberg, 
1977],  identify  10  different  types  of  causal  relations  that  relate  events  like  actions, 
tendencies,  states,  and  statechanges.  Shoham’s  logical  account  of  causation  [Shoham, 
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1988]  relates  temporal  propositions,  with  the  causal  relation  being  an  INUS  condi¬ 
tion  [Mackie,  1974],  i.e.,  the  cause  is  an  Insufficient  but  Necessary  condition  of  an 
Unnecessary  but  Sufficient  condition  for  the  effect. 

In  this  thesis  we  adopt  the  representation  of  the  causal  relation  widely  used  in 
the  literature  on  qualitative  reasoning  about  physical  systems  [Weld  and  de  Kleer, 
1990].  In  this  representation,  the  causal  relation  relates  parameters  used  to  model  the 
physical  system,  and  the  causal  relation  itself  represents  a  dependence  of  the  value 
of  the  “effect  parameter”  on  the  “cause  parameter.”  We  discuss  this  in  detail  in  the 
next  section. 


3.3  Causal  ordering 

The  causal  relation  between  the  parameters,  introduced  above,  is  a  transitive  relation 
that  induces  an  ordering  on  the  parameters  called  a  causal  ordering.  The  dependency 
of  the  “effect  parameter”  on  the  “cause  parameter”  in  such  a  causal  ordering  takes 
one  of  two  forms:  functional  dependency  and  integration. 

The  functional  dependency  of  a  parameter  pi  on  a  parameter  p2  corresponds  to 
a  causal  mechanism  that  “instantaneously”  determines  the  value  of  pi  eis  a  function 
of  the  value  of  p2  (and,  possibly,  some  other  parameters).  We  have  quoted  the  word 
“instantaneously”  to  emphasize  that  what  counts  tis  “instantaneously”  is  a  modeling 
decision  related  to  the  time  scale  of  interest  [Iwasaki,  1988;  Kuipers,  1987].  For 
example,  at  a  time  scale  of  minutes,  a  thermistor’s  resistance  is  functionally  dependent 
on  its  temperature;  a  change  in  the  temperature  can  be  viewed  as  instantly  causing 
a  change  in  the  resistance.  However,  at  a  much  smaller  time  scale  one  can  actually 
observe  a  delay  in  the  change  in  resistance  due  to  the  change  in  temperature.  Causal 
relations  as  functional  dependencies  have  been  studied  in  [de  Kleer  and  Brown,  1984; 
Williams,  1984;  Iwasaki  and  Simon,  1986b]  and  in  [Forbus,  1984],  where  they  are 
called  indirect  influences. 

The  other  type  of  causal  relations  between  parameters  is  the  integration  relation 
between  a  parameter  and  its  derivative.  In  contrast  to  functional  dependencies  that 
act  instantaneously,  the  integration  relation  acts  over  a  period  of  time.  For  example. 


3.3.  CAUSAL  ORDERING 


47 


the  total  amount  of  charge  stored  in  a  capacitor  depends  on  the  net  flow  of  current 
into  the  capacitor  over  a  period  of  time;  the  amount  of  stored  charge  is  calculated  by 
integrating  the  current  flow  over  that  period  of  time.  Causal  relations  as  in<-egration 
have  been  studied  in  [Iwasaki,  1988]  and  in  [Forbus,  1984],  where  they  are  called  direct 
influences. 

3.3.1  Loops  in  the  causal  ordering 

As  mentioned  above,  the  causal  relation  between  parameters  is  transitive.  However, 
we  do  not  insist  that  the  causal  relation  be  anti-symmetric,  i.e.,  a  parameter  pi 
can  simultaneously  causally  depend  on,  and  can  causally  determine,  a  parameter  p2- 
Such  loops  in  the  causal  ordering  are  manifestations  of  feedback  in  the  behavior  of 
the  physical  system.  The  proper  handling  of  such  feedback,  and  the  resulting  loops  in 
the  causal  ordering,  is  the  focus  of  much  debate  and  ongoing  research  [Bobrow,  1984; 
Iwasaki  and  Simon,  1986b;  de  Kleer  and  Brown,  1986;  Iwasaki  and  Simon,  1986a; 
Rose  and  Kramer,  1991].  In  this  thesis  we  adopt  the  (somewhat  neutral)  viewpoint, 
advanced  in  [Iwasaki  and  Simon,  1986b],  of  merely  viewing  such  feedback  as  a  set  of 
interdependent  parameters. 

3.3.2  Equations 

The  causal  ordering  of  a  set  of  parameters  used  to  model  a  physical  system  is  derived 
from  a  set  of  algebraic,  qualitative  and/or  differential  equations  describing  the  phys- 
icil  system.  Equations,  as  such,  can  be  viewed  as  acausal  representations  of  domain 
mechanisms.  For  example,  the  equation  V  —  iR  (Ohm’s  law)  is  an  acausal  repre¬ 
sentation  of  a  mechanism  for  electrical  conduction.  It  merely  s  tates  that  the  voltage 
across  an  electrical  conductor,  V,  is  proportional  to  the  current  through  the  conduc¬ 
tor,  i,  with  the  resistance  of  the  conductor,  R,  being  the  proportionality  constant. 
However,  it  makes  no  causal  claims  like  “the  voltage  depends  on  the  current.” 

To  have  a  causal  import,  equations  must  be  causally  oriented.  A  causally  ori¬ 
ented  equation  represents  the  fact  that  one  of  the  parameters  of  the  equation  is 
directly  causally  dependent  on  the  other  parameters  of  the  equation.  The  dependent 
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parameter  is  said  to  be  causally  determined  by  the  equation.  For  example,  the  acausal 
equation  V  =  iR  can  be  causally  oriented  so  that  it  causally  determines  V,  making 
V  causally  dependent  on  i  and  R. 

The  causal  orientation  of  an  equation  can  be  fixed  a  priori  [Forbus,  1984],  or  it 
can  be  inferred  from  the  equations  comprising  a  model  of  the  system  [de  Kleer  and 
Brown,  1984;  Williams,  1984;  Iwasaki  and  Simon,  1986b;  Iwasaki,  1988).  Fixing  the 
causal  orientation  of  each  equation  a  priori  is  overly  restrictive,  since  different  causal 
orientations  are  often  possible.  However,  not  all  causal  orientations  fit  our  intuitions 
about  causality.  For  example,  the  equation  V  =  iR  can  be  causally  oriented  in  one 
of  two  ways;  either  V  can  be  causally  dependent  on  i  and  R,  or  i  can  be  causally 
dependent  on  V  and  R.  However,  the  third  possibility,  R  being  causally  dependent 
on  V  and  i,  makes  no  sense  because,  in  an  ordinary  electrical  conductor,  there  is  no 
way  that  changing  V  and/or  i  can  cause  a  change  in  R. 

The  set  of  allowed  causal  orientations  of  an  equation,  e,  can  be  represented  by  the 
set,  Pc(e),  of  parameters  that  can  be  causally  determined  by  e.  As  a  typographical 
aid,  parameters  that  can  be  causally  determined  by  an  equation  will  be  typeset  in 
boldface,  e.g.,  V  =  iR  says  that  this  equation  can  causally  determine  V  and  i  but 
not  R.  We  extend  the  function  Pc  to  a  set  E  of  equations  in  the  natural  way: 

Pc{E)  =  (J  Pc{e)  (3.1) 

Similarly,  we  extend  Pc  to  a  model  M  as  follows  (recall  that  a  model  fragment  m  €  M 
is  just  a  set  of  equations): 

Pc{M)  =  (J  Pc{m)  (3.2) 

In  addition,  let  P{e)  be  the  set  of  all  parameters  in  equation  e.  Extend  P  to  a  set  E 
of  parameters,  and  to  a  model  M,  as  follows; 

P(E)  = 

P{M)  =  U  P{m) 

m^M 


(3.3) 

(3.4) 
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3.3.3  Computing  the  causal  ordering 

As  mentioned  earlier,  the  causal  ordering  of  a  set  of  parameters  is  derived  from  the 
set  of  equations  representing  a  model  of  the  system  under  consideration.  Iwasaki  and 
Simon  provide  an  algorithm  for  computing  the  causal  ordering  [iwasaki  and  Simon, 
1986b;  Iwasaki,  1988].  However,  that  algorithm  is  a  worst-case  exponential  time 
algorithm.  In  this  section  we  describe  an  efficient  algorithm  for  computing  the  causal 
ordering  based  on  the  work  of  Serrano  and  Gossard  [Serrano  and  Gossard,  1987]. 

Serrano  and  Gossard  make  the  key  observation  that,  given  a  set  of  equations,  the 
causal  ordering  of  the  parameters  can  be  generated  by  (a)  causally  orienting  each 
equation  such  that  each  parameter  is  causally  determined  by  exactly  one  equation; 
and  (b)  taking  the  transitive  closure  of  the  direct  causal  dependency  links  entailed 
by  the  causal  orientations.* 


Causal  mappings 

We  formalize  Serrano  and  Gossard’s  observation  by  first  defining  a  causal  mapping: 

Definition  3.1  (Causal  n  apping)  Let  E  be  a  set  of  equations.  A  function  F  : 
E  P{E)  is  said  to  be  a  causal  mapping  if  and  only  if  (a)  F  is  1-1;  and  (b)  for  each 
e  €  £',  F(e)  €  Pc{^).  F  is  an  onto  causal  mapping  if  for  each  parameter  p  €  P{E), 
there  is  an  equation  eQ.  E,  such  that  F{e)  =  p. 

Hence,  a  causal  mapping  causally  orients  each  equation  such  that  each  parameter  is 
causally  determined  by  at  most  one  equation,  while  an  onto  causal  mapping  causally 
determines  every  parameter.  A  causal  mapping  is  said  to  be  partial  if  it  is  not  defined 
on  every  equation.^ 

Note  that  the  co-domain  of  F  in  the  above  definition  is  P{E)  and  not  Pc{E), 
even  though  condition  (b)  guarantees  that  the  range  of  F  is  a  subset  of  Pc{E).  We 

*  Serrano  and  C  ossard  do  not  actually  talk  about  causal  ordering  or  causal  orientations.  They  are 
interested  in  efficiently  evaluating  a  set  of  constraints.  However,  the  peireimeter  dependencies  that 
they  generate  are  identical  to  the  causal  ordering,  and  their  eJgorithm  can  be  viewed  as  causally 
orienting  each  equation.  Hence,  we  attribute  the  above  observation  to  them. 

"Hence,  causal  mapj'ings  as  defined  in  Definition  3.1  are  more  precisely  named  total  causal 
mappings.  However,  for  simplicity,  we  shall  assume  that  all  causal  mappings  are  total,  unless  we 
explicitly  mention  them  to  be  partial. 
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have  chosen  P{E)  as  the  co-domain  of  F  to  ensure  that  when  F  is  onto  then  each 
parameter  in  P{E)  is  causally  determined  by  an  equation  in  E. 

Properties  of  causal  mappings 

Let  E  he  a,  set  of  equations  and  let  F  :  E  -*  P{E)  be  a  (possibly  partial)  causal 
mapping.  The  direct  causal  dependencies  entailed  by  F  is  denoted  by  Cf,  and  is 
defined  as  follows: 

Cf  =  {{PuPn)  ■■  (3e  €  E)  F{e)  =  p2  A  pi  e  P(e)}  (3.5) 

In  other  words,  (pi,p2)  €  Cp  if  and  only  if  p2  directly  causally  depends  on  pi  in  the 
causal  orientations  defined  by  F.  Denote  the  transitive  closure  of  Cp  by  tc{Cp).  The 
following  lemma  states  that  the  transitive  closure  of  different  onto  causal  mappings  of 
E  are  identical.  (We  will  soon  discuss  conditions  under  which  onto  causal  mappings 
exist.) 

Lemma  3.1  Let  E  be  a  set  of  independent  equations,  and  let  Fi  :  E  —*  P{E)  and 
F2:  E  P{E)  be  onto  causal  mappings.  Then  tc{Cpi)  =  tc{Cpj). 

Proof:  To  show  that  tc{Cp^)  =  tc{Cpj)  we  need  to  show  that  tc{Cp^)  C  tc{Cpj) 
and  tc{Cpj^)  C  tc{Cp^).  We  prove  the  first  containment,  with  the  second  containment 
following  by  a  symmetric  argument.  To  show  that  <c(Cf,)  Q  tc{Cpj),  it  suffices  to 
show  that  Cfi  C  tc{Cp,),  since  tc{tc{Cpj))  =  tc{Cpj). 

Let  {q,p)  €  Cp^,  and  let  e  €  such  that  Fi(e)  =  p,  and  hence  q  €  P(e).  We  show 
that  {q,p)  €  tc{Cp^).  There  are  two  cases: 

1.  If  F2{e)  =  p,  then  {q,p)  €  Cp,,  and  hence  {q,p)  €  tc{Cp,). 

2.  If  F2(e)  ^  p,  construct  the  sequence  po,pi,...,Pm  such  that  (a)  po  =  p;  (b) 
Pi  =  F2(Ff^(p,_i)),  for  1  <  i  <  m;  (c)  pm  is  the  first  repetition  in  the  sequence, 
i.e..  Pi  ^  Pj,0  <  i,j  <  (m  —  1), z  7^  j,  and  pm  =  Pi,  for  some  i,  0  <  z  <  (m  —  1). 
Such  a  sequence  must  exist  because  F]  and  F2  are  onto  causal  mappings,  and 
because  there  are  a  finite  number  of  parameters.  In  addition,  observe  that 
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m  >  2,  since  if  m  =  1,  it  follows  that  po  =  ^2(^1  ^(po)),  which  leads  to  a 
contradiction  ss  follows; 


P  =  Po 

=  F2{Fr\po)) 

=  F2{Fr\p)) 

=  F,{e) 

which  contradicts  the  assumption  that  F2(e)  ^  p. 

We  now  show  that  =  Po-  Suppose  not,  so  that  pm  =  Pi  for  some  i,  1  < 
i<{m-  1).  Hence,  it  follows  that  pm-i  =  FiiF^^Pm))  =  Fi{FF\p{))  =  p.-i, 
which  contradicts  condition  (c)  above.  Hence,  pm  =  po- 

Next,  let  e,'  =  Fi“^(p,_i),  for  1  <  i  <  m.  Hence,  p,_i  €  P{ei)  and  pi  =  #2(6,). 
Hence,  it  follows  that  (pi_i,p,)  €  Hence,  by  transitivity,  it  follows  that 
(Pi^Pm)  €  tciCp-i)',  and  since  p„  =  po  =  p,  it  follows  that  (pi,p)  €  tc{Cp^). 

Now  there  are  two  cases:  (a)  ifpi  =  q,  then  it  follows  that  (g,p)  €  tc{Cp^)\  or 
(b)  if  Pi  ^  q,  then  since  pi  =  /’5(e)  and  9  €  P(e),  it  follows  that  {q,pi)  €  Cp^, 
and  hence  by  transitivity  (g,p)  t  tc{Cp^).  In  either  case,  (g,p)  €  MCfj),  and 
we  are  done. 

□ 


Intuitively,  the  above  proof  shows  that  if  Fi  and  F2  differ  on  the  parameter  to 
which  an  equation  e  is  mapped,  then  the  parameters  Fi(e)  and  F2(e)  are  causally 
dependent  on  each  other.  For  example,  consider  the  set  of  equations,  and  two  different 
onto  causal  mappings  Fi  and  F2,  shown  in  Figure  3.1. 

Note  that  F^  and  F2  agree  on  the  peirameters  assigned  to  the  first  two  equations, 
while  they  differ  on  the  parameters  tissigned  to  the  last  two  equations.  However, 
under  Fi,  u  causally  depends  on  v  from  the  mapping  of  the  third  equation  while  v 
causally  depends  on  u  from  the  mapping  of  the  fourth  equation,  i.e.,  u  and  v  are 
interdependent.  Similarly,  under  F2,  u  causally  depends  on  v  from  the  mapping  of 
the  fourth  equation  while  v  causally  depends  on  u  from  the  mapping  of  the  third 
equation;  once  again,  u  and  v  are  interdependent. 
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Equations 

F 

F2 

exogenotis{x) 

X 

X 

x  =  y 

y 

y 

u  +  v  =  y 

u 

V 

u  —  V  =  0 

V 

u 

Figure  3.1:  A  set  of  equations  with  two  onto  causal  mappings 
Causal  ordering  definition 

Using  causal  mappings  and  the  above  lemma,  we  now  define  the  causal  ordering  of 
a  set  of  parameters  generated  from  a  set  of  equations.  Before  we  do  this,  we  first 
introduce  the  integration  completion  of  a  set  of  equations.  Recall  that  the  integration 
relation  between  a  parameter  and  its  derivative  constitutes  a  causal  dependency  from 
the  derivative  to  the  parameter.  We  represent  this  relation  with  the  int  equation: 
int(pi,p2)  says  that  pj  is  the  derivative  of  pi.  Note  that  mt(pj,p2)  can  be  causally 
oriented  in  only  one  way,  to  causally  determine  pi  by  integrating  the  value  of  p2  over 
time.  Given  a  set  E  of  equations  the  integration  completion  of  E  maJces  explicit  all 
such  integration  links  among  the  parameters  of  E: 

Definition  3.2  (Integration  completion)  Let  E  be  a  set  of  equations.  The  inte¬ 
gration  completion  of  E,  denoted  ic{£),  is  defined  as  follows: 

ic{E)  =  EU  {int{q,dq/dt)  :  dq/dt  G  P{E)} 

i.e.,  whenever  P{E)  contains  a  derivative,  the  integration  completion  of  E  contains  an 
int  equation  expressing  the  integration  relation.  Note  that  if  E  contains  no  differential 
equations,  then  E  =  ic{E). 

We  now  define  the  causal  ordering  generated  from  a  set  of  equations  as  the  tran¬ 
sitive  closure  of  direct  causal  dependencies  generated  by  any  onto  causal  mapping  of 
the  integration  completion  of  the  set  of  equations: 

Definition  3.3  (Causal  order)  Let  E  be  a  set  of  independent  equations,  and  let 
F  :  ic{E)  P{E)  be  an  onto  causal  mapping.  The  causal  order  of  the  parameters  of 
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E,  denoted  C{E),  is  the  transitive  closure  of  Cp: 

C(E)  =  tciCp) 

The  causal  ordering  is  well  defined  because  Lemma  3.1  assures  us  that  the  transitive 
closures  of  all  onto  causal  mappings  of  a  set  of  equations  axe  identical.  This  allows 
us  to  define  the  causal  ordering  of  a  set  of  equations  as  the  transitive  closure  of  any 
onto  causal  mapping.  The  use  of  ic(E),  instead  of  E,  in  the  above  definition  ensures 
that  causal  dependencies  due  to  integration  links  are  included  in  the  causal  ordering. 

Next,  we  investigate  conditions  under  which  the  causal  ordering  exists,  i.e.,  con¬ 
ditions  under  which  an  onto  causal  mapping  exists. 

Existence  of  onto  causal  mappings 

We  start  by  defining  what  it  means  for  a  set  of  equations  to  be  complete,  overcon¬ 
strained,  and  incomplete.  Informally,  a  set  of  equations  is  (a)  complete  if  it  has  as 
many  equations  as  parameters,  and  no  subset  of  equations  has  fewer  parameters  than 
equations;  (b)  overconstrained  if  some  subset  of  equations  has  more  equations  than 
parameters;  and  (c)  incomplete  if  some  subset  of  equations,  that  has  no  parameters  in 
common  with  its  complement,  has  more  parameters  than  equations.  More  precisely, 
we  have  the  following  definitions: 

Definition  3.4  Let  E  be  a  set  of  independent  equations.^ 

•  E  is  said  to  be  complete  if  and  only  if  (a)  |?c(£')|  =  |T*c(ic(E))|  =  |T’(E)|;  and 
(b)  for  every  S  C  ic{E),  |51  <  |Fc(5')|. 

•  E  is  said  to  be  overconstrained  if  and  only  if  there  exists  S  C  ic{E)  stich  that 
1^1  >  \PciS)\. 

* 

•  E  is  said  to  be  incomplete  if  and  only  if  there  exists  S  C  ic(E)  such  that  either 
(a)  Pc{S)r)P,{ic{E)\S)  =  0  and  |5|  <  \Pc{S)\;  or  (b)  P{S)r) P{ic{E)\S)  =  0 
and  |5|  <  \P{S)\. 

^“1  •  I”  returns  the  cardinality  of  a  set.  “\”  is  the  set  difference  operator. 
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We  now  show  that  an  onto  causal  mapping  exists  if  and  only  if  the  set  of  equations 
is  complete.  This  means  that  when  a  set  of  equations  is  complete,  all  the  parameters 
in  the  equations  can  be  causally  determined. 

Lemma  3.2  Let  E  be  a  set  of  independent  equations.  Then  there  exists  an  onto 
causal  mapping  F  :  ic{E)  —*  P{E)  if  and  only  if  E  is  complete. 

Proof:  To  prove  this  lemma,  we  start  by  defining  a  bipartite  graph  representing  the 
set  of  equations  E. 

Definition  3.5  Let  E  be  a  set  of  independent  equations.  Let  G  =  {X,Y,R)  be  a 
bipartite  graph  such  that  X  U  Y  is  the  set  of  nodes,  R  is  the  set  of  edges,  and  each 
edge  connects  a  node  in  X  to  a  node  in  Y.  G  is  said  to  represent  E  if  and  only  if^ 

1.  X  =  ic{E),  i.e.,  there  is  a  node  in  X  for  each  equation,  including  the  integration 
equations; 

2.  Y  =  P(E),  i.e.,  there  is  a  node  in  Y  for  each  parameter;  and 

3.  {x,y)  €  R  if  and  only  if  x  e  X{=  ic{E)),  y  €  Y{=  P{E)),  and  y  €  Pc{x),  i.e., 
an  equation  is  connected  to  a  parameter  if  and  only  if  the  equation  can  causally 
determine  the  parameter. 

For  examp!  ,  the  bipartite  graph  representing  the  equations  shown  in  Figure  3.1 
is  shown  in  Figure  3.2. 

A  matching  in  a  bipartite  graph  is  a  set  of  edges  such  that  no  two  edges  in  the 
matching  share  a  common  node.  A  matching  is  said  to  be  complete  if  and  only  if  each 
node  in  the  graph  is  covered  by  an  edge  in  the  matching,  i.e.,  each  node  has  an  edge 
in  the  matching  incident  upon  it.  For  example,  a  complete  matching  in  the  bipartite 
graph  of  Figure  3.2  consists  of  the  following  edges: 

{{exogenous{x),  x),  (x  =  y, y),  {u  +  v  =  y,u),{u  -  v  =  0,  u)} 

^This  representation  of  the  set  of  equations  is  due  to  Serrano  and  Gossard  [Serrano  and  Gossard, 
1987]. 
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Figure  3.2:  Graph  representing  a  set  of  equations 

From  the  above  definitions,  it  follows  that  an  onto  causal  mapping  F  :  ic{E)  — » 
P{E)  corresponds  to  a  complete  matching  in  the  bipartite  graph  representing  E,  and 
vice  versa.  In  particular,  the  complete  matching  corresponding  to  an  onto  causal 
mapping  F  :  ic{E)  — »  P{E)  is  the  following  set  of  edges: 

{(e,/’(c)) :  e€  ic(F:)} 

Hence,  it  follows  that  an  onto  causal  mapping  F  :  ic{E)  —*  P{E)  exists  if  and 
only  if  a  complete  matching  exists  for  the  bipartite  graph  representing  E.  How¬ 
ever,  Hall’s  theorem  [Even,  1979,  pages  137-138]  tells  us  that  a  bipartite  graph 
G  =  (X,  F,  R)  contains  a  complete  matching  if  and  only  if  (a)  IX]  =  |F|;  and  (b)  for 
every  A  C  X.,  |i4|  <  |/2(i4)|,  where  R{A)  denotes  the  set  of  nodes  connected  to  the 
nodes  in  A  by  edges  in  R.  However,  from  Definition  3.5,  condition  (a)  is  equivalent 
to  saying  |ic(E)|  =  |P(E)|,  and  condition  (b)  is  equivalent  to  saying  that  for  every 
S  C  ic[E),  |S|  <  |Pc(‘S’)!.  But,  from  Definition  3.4,  this  is  equivalent  to  saying  that 
E  is  a  complete  set  of  equations.  Hence,  it  follows  that  there  exists  an  onto  causal 
mapping  F  :  ic{E)  —*  P{E)  if  and  only  if  E  is  complete.  □ 

Causal  ordering  algorithm 

The  proof  of  the  above  lemma  leads  directly  to  the  efficient  causal  ordering  algorithm 
based  on  Serrano  and  Gossard’s  work  [Serrano  and  Gossard,  1987].  This  algorithm 
is  shown  in  Figure  3.3. 

In  this  algorithm,  step  1  uses  Definition  3.5  to  construct  the  bipartite  graph  rep¬ 
resenting  E.  Step  2  constructs  a  maximum  matching  in  this  graph.  A  maximum 
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function  find-causal-order{E) 

1.  Using  Definition  3.5  construct  G,  the 
bipartite  graph  representing  E; 

2.  Construct  a  maximum  matching  for  G; 

3.  if  the  above  matching  is  complete  then 

a.  Let  F  be  the  corresponding  onto  causal  mapping; 

b.  return  the  transitive  closure  of  the  direct  causal 

dependencies  entailed  by  F 

else 

c.  return  nil  /*  No  onto  causal  mapping  exists  */ 
endif 

end 


Figure  3.3:  Causal  ordering  algorithm 

matching  is  a  matching  with  maximum  cardinality.  If  n  is  the  number  of  nodes,  and 
e  the  number  of  edges,  in  a  bipartite  graph,  a  maximum  matching  in  the  graph  can  be 
constructed  in  0(\/ne)  using  algorithms  for  finding  meiximumflow  in  networks,  e.g., 
see  [Even,  1979,  pages  135-138] .  Appendix  D  gives  a  brief  overview  of  this  algorithm. 
Step  3  checks  whether  or  not  this  matching  is  complete.  Note  that  if  a  complete 
matching  exists  then  it  is  a  maximum  matching.  Conversely,  if  a  complete  match¬ 
ing  exists  then  any  maximum  matching  is  a  complete  matching.  If  the  matching  is 
complete,  it  constructs  the  corresponding  causal  mapping  and  returns  its  transitive 
closure.  If  the  matching  is  not  complete,  then  no  complete  matching  exists,  and  the 
set  of  equations  is  not  complete.  Hence,  the  causal  ordering  is  not  well  defined,  and 
the  above  algorithm  returns  nil. 

We  now  illustrate  the  above  algorithm  with  an  example.  Figure  3.4  shows  a 
set  of  equations  describing  the  temperature  gauge  shown  in  Figure  1.1.  This  set  of 
equations  is  exactly  the  same  as  the  ones  shown  in  Figure  1.3,  except  that  here  we 
have  included  knowledge  of  allowed  causal  orientations  of  each  equation.  Figure  3.5 
shows  the  bipartite  graph  representing  this  set  of  equations.  This  figure  also  shows 
a  maximum  matching  consisting  of  the  thick  edges  with  arrow  heads  at  each  end. 
One  can  see  that  this  set  of  edges  forms  a  complete  matching.  Figure  3.6  shows  a 
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Linkage(bms-l,ptr-l) :  6p  =  k^Xb 
Thennal-bms(bms-l)  :  Xb  =  it2^6 
Heat-f  low(bms-l  ,atm-l)  :  fba  =  ^3(16  -  Ta) 
Heat-flow(wire-l,bms-l) :  f^b  =  k4{Tu>  -  Tb) 
Constant-temperature (atm-1) :  exogenous{Ta) 
Thermal-equilibrium(bms-l) ;  fba  =  fwb 
Thermal-equilibrium(wire-l) :  f^b  =  fw 
Resistor(wire-l) : 

Constant-resistance(wire-l) ;  exogenous{R^) 
Thermal-resistance(wire-l) :  fw  =  Vu,iw 
Electrical-thermistor(thermistor-l) :  VJ  =  itRt\  Rt  = 

Cons tant-vo Itage-source (battery- 1) :  exogenous{Vy) 

Kirchhoff’s  laws:  K  =  K -|- iy  =  it]  it  =  iw 
Input;  exogenous{Tt) 

Bp-.  Pointer  angle  Xb'.  Bms  deflection 

Ry,\  Wire  resistance  Rt\  Thermistor  resistance 

it'.  Thermistor  current  Vt‘.  Thermistor  voltage 

iyi'.  Wire  current  14;:  Wire  voltage 

it,:  Battery  current  14;  Battery  voltage 

Tb'.  Bms  temperature  Ta'.  Atm  temperature 

24,:  Wire  temperature  Tt'.  Thermistor  temperature 

Jba‘-  Heat  flow  (bms  to  atm)  fy^b'-  Heat  flow  (wire  to  bms) 
fy,:  Heat  generated  in  wire  ky.  Exogenous  constants 

Figure  3.4:  A  possible  model  of  the  temperature  gauge 

graphical  representation  of  the  direct  causal  dependencies  generated  from  the  causal 
mapping  corresponding  to  the  above  complete  matching.  Note,  in  particular,  the 
cycle  of  dependencies  between  it,iu;,  K,,  and  Vt. 

Miscellaneous  observations 

In  practice,  we  modify  step  .3b  of  function  find-causal-order  to  return  (7f,  the  graph 
of  direct  causal  dependencies  generated  by  the  causal  mapping  F  generated  in  step 
3a,  rather  than  its  transitive  closure,  C(E).  For  example,  the  function  would  then 
return  graphs  like  the  one  shown  in  Figure  3.6.  This  has  two  important  advantages: 
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6p  =  kiXb 

Xb  =  k2Tb 

fba  —  ^3(^4  —  To) 

fwb  —  k^{T.u]  — 
txogenous{Ta) 

fba  ~  fwb 
fwb  ~  fw 
^^w  ~~  ^wRw 
exogenous  (R^) 

fw  - 

Vt  =  itRt 

Rt  = 

exogenous{Vv) 
V,  =  K  +  Vt 

*V  —  ®t 

exogenous{Tt) 


Figure  3.5:  Bipartite  graph  representing  the  equations  in  Figure  3.4.  The  set  of  thick 
edges  with  arrow  heads  at  each  end  form  a  complete  matching. 


Figure  3.6:  The  direct  causal  dependencies  generated  by  the  causal  mapping  corre¬ 
sponding  to  the  complete  matching  shown  in  Figure  3.5. 
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1.  Paths  in  this  graph  provide  a  causal  explanation  for  how  one  parameter  causally 
depends  on  another.  For  example,  while  the  transitive  closure  of  the  graph 
shown  in  Figure  3.6  can  tell  us  that  6p  causally  depends  on  Tt,  it  is  unable  to 
say  that  this  causal  dependence  is  not  a  direct  causal  dependence,  i.e.,  is  not  due 
to  a  single  causal  mechanism.  On  the  other  hand,  the  graph  in  Figure  3.6  can 
be  used  to  give  a  detailed  explanation  for  how  9p  depends  on  Tt  by  identifying 
the  different  causal  mechanisms  that  mediate  this  dependence. 

2.  We  can  use  this  graph  to  easily  identify  the  minimal  sets  of  causally  inter¬ 
dependent  parameters,  without  incurring  the  cost  of  generating  the  transitive 
closure.®  The  minimal  sets  of  causally  interdependent  parameters  are  precisely 
the  strongly  connected  components  of  the  graph.  A  strongly  connected  com¬ 
ponent  of  a  directed  graph  is  a  maximal  set  of  nodes  in  the  graph  such  that 
there  exists  a  directed  path  from  each  node  in  the  set  to  every  other  node  in  the 
set.  An  efficient  algorithn  for  generating  the  strongly  connected  components 
of  a  directed  graph  is  found  in  [Even,  1979,  pages  64-66].  For  example,  the  set 

iw,  Vw,  Vt}  form  a  strongly  connected  component  of  the  graph  in  Figure  3.6, 
and  hence  these  parameters  are  causally  interdependent. 

If  step  2  results  in  a  maxii vium  matching  that  is  not  complete,  then  the  set  of 
equations  is  either  overconstrained,  oi  incomplete,  or  both.  Following  [Serrano  and 
Gossard,  1987].  sta^e  ?lic  following  without  proof; 

1.  If  the  maximum  matching  found  in  step  2  is  such  that  a  node  corresponding  to 
one  of  the  equations  in  ic{E)  is  not  covered  by  an  edge  in  the  matching,  then 
the  set  of  equations  is  overconstrained. 

2.  If  the  maximum  matching  found  in  step  2  is  such  that  a  node  corresponding  to 
one  of  the  parameters  in  P{E)  is  not  covered  by  an  edge  in  the  matching,  then 
the  set  of  equations  is  incomplete. 

The  proofs  of  the  above  statements  are  similar  to  the  proof  of  Lemma  3.2. 

®One  can  easily  show  that  these  minimal  sets  of  causally  interdependent  parameters  are  the 
minimal  complete  subsets  identified  by  the  causal  ordering  dgorithm  in  [iwasaki  and  Simon,  1986b]. 
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This  concludes  our  discussion  of  causal  explanations  and  how  they  are  generated 
from  a  model,  i.e.,  from  a  set  of  equations.  We  now  proceed  to  define  the  criteria 
that  we  use  for  model  adequacy.  The  next  section  introduces  the  consistency  and 
completeness  of  a  model;  adequate  models  are  required  to  be  consistent  and  com¬ 
plete.  Section  3.5  introduces  our  representation  for  the  phenomenon  of  interest;  an 
adequate  model  must  be  able  to  provide  a  causal  explanation  for  the  phenomenon 
of  interest.  Sections  3.6  and  3.7  introduce  constraints  stemming  from  the  structural 
and  behavioral  contexts  of  a  physical  system,  that  must  be  satisfied  by  an  adequate 
model.  Finally,  Section  3.8  will  introduce  a  simplicity  ordering  on  the  set  of  models, 
with  an  adequate  model  being  a  simplest  model  that  satisfies  all  the  above  criteria. 


3.4  Consistency  and  completeness  of  models 

In  this  section  we  define  the  two  notions  of  model  consistency  zmd  model  completeness. 
Recall  from  Chapter  2,  that  a  model  can  be  viewed  in  one  of  two  ways:  (a)  as  a  set 
of  model  fragments  (Section  2.3.2);  and  (b)  as  a  set  of  equations  (Section  2.1.2).  Our 
definitions  of  model  consistency  and  model  completeness  will  be  based  on  knowledge 
stemming  from  both  these  viewpoints. 

3.4.1  Model  consistency 

Recall  that  when  two  model  fragments  make  contradictory  assumptions  about  the 
domain  they  are  related  by  the  contradictory  relation  (Section  2.3.4).  Therefore, 
the  use  of  contradictory  model  fragments  in  a  model  is  undesirable.  Similarly,  in 
Definition  3.4  we  defined  the  notion  of  an  overconstrained  set  of  equations.  If  a  set 
of  independent  equations  is  overconstrained,  then  the  equations  have  no  solutions,® 
leading  to  a  contradiction. 

The  above  observations  lead  directly  to  our  definition  of  a  consistent  model: 

Definition  3.6  (Consistent  model)  A  model  M  is  said  to  be  consistent  if  and  only 
if  the  following  two  conditions  are  satisfied: 


®Being  independent,  the  possibility  of  the  equations  being  merely  redundant  is  ruled  out. 
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1.  Vmi,Tn2  €  M  -'Contradictory{mi,m2),  i.t.,  the  model  does  not  contain  mutually 
contradictory  model  fragments; 

2.  The  set  of  equations  of  M  is  not  overconstrained. 

An  immediate  consequence  of  the  above  definition  is  that  a  consistent  model  can 
have  at  most  one  model  fragment  from  each  assumption  class.  Consistency  is  the 
first  important  property  of  an  adequate  model;  a  model  that  makes  contradictory 
assumptions  about  the  domain,  or  whose  equations  are  inconsistent,  is  undesirable. 
Hence,  we  have; 

•  An  adequate  model  must  be  consistent. 

For  example,  any  consistent  model  of  the  temperature  gauge  in  Figure  1.1  can¬ 
not  simultaneously  model  the  wire  both  as  an  Ideal-conductor  and  as  a  Resistor 
because  these  two  model  fragment  classes  contradict  each  other.  Similarly,  no  con¬ 
sistent  model  of  the  temperature  gauge  will  model  both  the  wire  and  the  thermistor 
as  Ideal-conductors  and  the  battery  as  a  Constant-voltage-source.  This  is  be¬ 
cause  this  set  of  modeling  choices  would  lead  to  the  following  overconstrained  set  of 
equations; 

V.  =  0 

V,  =  0 

exogenous{Vy) 

V,  =  %  +  Vt 

3.4.2  Model  completeness 

Recall  that  model  fragments  are  partial  descriptions  of  phenomena.  Additional  model 
fragments,  drawn  from  the  set  of  required  assumption  classes,  are  required  to  com¬ 
plete  this  description  (Section  2.3.4).  A  complete  model  must  include  complete  de¬ 
scriptions  of  all  phenomena  that  are  being  modeled.  Hence,  a  complete  model  must 
include  model  fragments  from  all  required  assumption  classes.  In  addition,  we  will 
require  that  the  equations  of  a  complete  model  be  able  to  causally  determine  all  the 
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parameters  of  the  model.  From  Lemma  3.2  we  know  that  when  a  set  of  equations  is 
complete  then  all  the  parameters  can  be  causally  determined,  and  the  causal  ordering 
is  well  defined.  These  observations  lead  directly  to  our  definition  of  a  complete  model; 

Definition  3.7  (Complete  model)  A  model  M  is  said  to  be  complete  if  and  only 
if  the  following  two  conditions  are  satisfied: 

1.  (Vm  €  M)  requires{m,  A)  =4»  (3m'  €  >1)  m'  €  M,  i.e.,  the  model  contains  a 
model  fragment  from  each  required  assumption  class;  and 

2.  The  set  of  equations  of  M  is  complete. 

Completeness  is  the  second  important  property  of  an  adequate  model;  an  ad¬ 
equate  model  must  include  complete  descriptions  of  all  phenomena  that  are  being 
modeled,  and  the  model’s  equations  must  be  complete  so  that  we  can  generate  causal 
explanations  for  phenomena  of  interest.  Hence,  we  have: 

•  An  adequate  model  must  be  complete. 

For  example,  the  model  shown  in  Figure  3.4  is  complete.^ 


3.5  Representing  the  phenomenon  of  interest 

Toward  the  end  of  Section  3. 1  we  stated  that,  in  this  thesis,  we  will  define  the  adequacy 
of  a  model  with  respect  to  the  task  of  generating  causal  explanations  for  a  phenomenon 
of  interest.  Hence,  the  phenomenon  of  interest  is  a  crucial  input  that  focuses  model 
selection.  We  cadi  the  phenomenon  of  interest  the  expected  behavior.  The  expected 
behavior  of  a  device  is  an  abstract  description  of  what  the  system  does  (but  not  how 
it  does  it).  The  causal  explanation  generated  by  a  model  is  a  description  of  how  the 
expected  behavior  is  achieved. 

The  expected  behavior  captures,  in  part,  what  is  commonly  referred  to  a^  the 
function  of  a  device.  For  example,  stating  that  the  device  in  Figure  1.1  is  a  tem¬ 
perature  gauge  indicates  that  the  device  model  must  explain  how  the  temperature  of 
^Though  we  haven’t  shown  the  requires  constraints,  in  fact  they  are  all  satisfied  in  this  model. 
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the  thermistor  determines  the  angular  position  of  the  pointer.  Expected  behaviors 
can  also  provide  abstract  descriptions  of  device  behaviors  that  would  not  normally 
be  considered  the  device’s  primary  function.  Such  knowledge  of  the  expected  behav¬ 
ior  is  commonplace  and  almost  always  available  either  directly  from  the  user,  from 
the  description  of  tho  problem  to  be  solved,  or  from  the  context  in  which  the  device 
operates. 

For  example,  a  student  wanting  to  understand  how  a  device  works  can  provide  an 
intelligent  tutoring  system  a  description  of  the  expected  behavior  that  he  or  she  wants 
explained.  Or,  for  example,  an  automated  diagnosis  program  that  diagnoses  faults  in 
a  device,  must  first  be  provided  with  a  description  of  the  what  the  correctly  working 
device  is  supposed  to  do.  Finally,  device  names,  such  as  light  bulb,  vacuum  cleaner, 
and  disk  drive  are  widely  used  and  all  are  associated  with  expected  behaviors.  The 
most  common  expected  behavior  descriptions  are  input/output  descriptions  of  device 
behavior. 

Following  our  discussion  of  causal  ordering  in  Section  3.3,  we  specify  expected 
behaviors  as  a  query  that  requests  a  causal  explanation  for  how  one  parameter  causally 
depends  on  another.  For  example,  the  expected  behavior  of  the  temperature  gauge 
shown  in  Figure  1.1,  representing  its  primary  function,  is: 

causes{Tt,  dp) 

where  Tt  is  the  temperature  of  the  thermistor  and  Op  is  the  angular  position  of  the 
pointer.  This  expected  behavior  requests  a  causal  explanation  for  how  the  tempera¬ 
ture  of  the  thermistor  causally  determines  the  angular  position  of  the  pointer. 

The  expected  behavior  provides  us  with  our  most  important  criterion  for  model 
adequacy: 

•  An  adequate  model  must  explain  the  expected  behavior,  i.e.,  a  model  is  adequate 
with  respect  to  an  expected  behavior,  causes{pi,p2),  if  it  is  able  to  provide  a 
causal  explanation  for  how  p2  causally  depends  on  p\ . 

Given  such  an  expected  behavior,  one  can  use  the  procedures  described  in  Sec¬ 
tion  3.3  to  check  whether  or  not  a  device  model  is  able  to  provide  a  explanation  for 
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how  the  second  parameter  causalij  depends  on  the  first  parameter.  This  procedure 
is  briefly  summarized  in  Figure  3.7. 


function  check-expected-behavior(M,  pi ,  P2 j 
f*  M  is  a,  model,  assumed  to  be  consistent  and  complete  */ 

/*  causes{pi , P2)  is  the  expected  behavior  */ 

1.  Let  E  be  the  equations  of  M 

/*  Section  2.3.3  describes  how  to  do  this  */ 

2.  Compute  C(E),  the  causal  ordering  generated  from  E 
/*  Section  3.3  describes  how  to  do  this  */ 

3.  if  (pi,P2)  e  C{E)  then 

/*  The  expected  behavior  is  satisfied  */ 
return  true 

/*  The  causal  explanation  can  also  be  returned  (Section  3.3)  */ 
else 

/*  The  expected  behavior  is  not  satisfied  */ 
return  false 
endif 


Figure  3.7:  Algorithm  for  checking  whether  a  model  can  explain  the  expected  behav¬ 
ior. 


For  example,  the  model  in  Figure  3.4  is  able  to  explain  the  expected  behavior 

caus€s{Tt,  9p) 

since  6p  causally  depends  on  7)  in  the  causal  ordering  generated  from  this  model, 
shown  in  Figure  3.6. 

It  must  be  noted  that  our  language  for  expressing  the  expected  behaviors  is  ex¬ 
tremely  simple;  it  only  allows  us  to  ask  for  explanations  for  causal  dependencies 
between  parameters.  More  expressive  languages  are,  of  course,  desirable.  We  might 
want  to  include  information  about  the  directions  of  change,  e.g.,  we  might  want  to 
say  that  increasing  Tt  causes  Op  to  increase.  Or  we  might  want  to  include  more  in¬ 
formation  about  the  actual  functional  relationship,  e.g.,  we  might  want  to  say  that 
there  is  a  linear  relationship  between  Tt  and  Op. 
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However,  the  price  we  must  pay  for  using  more  expressive  languages  for  the  ex¬ 
pected  behavior  is  that  checking  whether  or  not  the  expec.ed  behavior  is  satisfied 
becomes  very  expensive,  and  can  often  even  be  impossible.  For  example,  deciding 
whether  an  increase  in  Tt  causes  an  increase  or  a  decrease  in  6p  with  purely  quali¬ 
tative  information  is  not  possible  when  there  are  competing  influences.  Additional 
information  about  the  relative  magnitudes  of  these  influences  is  necessary,  which  may 
or  may  not  be  available.  Hence,  we  have  chosen  a  simple,  though  useful,  language 
for  expressing  the  expected  behavior,  leading  to  an  efficient  algorithm  for  deciding 
whether  or  not  a  model  satisfies  the  expected  behavior. 

Thus  far  we  have  said  that  an  adequate  model  must  be  consistent  and  complete, 
and  must  be  able  to  explain  the  expected  behavior.  In  addition  to  these  constraints,  a 
domain  expert  might  want  to  place  additional  domain-dependent  constraints  on  model 
adequacy.  We  now  investigate  two  important  classes  of  such  constraints,  stemming 
from  the  structural  and  behavioral  contexts  of  the  device.  These  constraints  are 
expressed  using  a  first-order  constraint  language,  and  an  adequate  model  must  satisfy 
all  such  constraints.  Symbols  in  these  constraints  that  begin  with  “?”  are  variables. 
Constraints  are  all  evaluated  with  respect  to  a  component  of  interest,  with  the  variable 
“Tobject”  being  bound  to  that  component.  All  other  variables  in  the  constraints  are 
assumed  to  be  existentially  quantified. 


3.6  Constraints  from  the  structural  context 

In  this  section  we  discuss  the  structural  context,  an  important  source  of  constraints 
on  model  adequacy.  We  will  then  discuss  different  types  of  constraints  that  stem  from 
the  structural  context. 

3.6.1  Structural  context 

The  structural  context  of  a  device  consists  of  the  different  aspects  of  the  structure  of 
the  device.  Informally,  the  structure  of  a  device  is  a  description  of  how  the  device  is 
physically  put  together.  It  includes  the  components  in  the  the  device,  the  physical 
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and  structural  properties  of  these  components,  and  the  structural  relations  between 
these  components  that  describe  how  they  are  put  together  to  form  the  device. 

Components 

The  components  that  can  be  used  to  describe  the  structure  of  a  device  are  drawn  from 
a  library  of  component  types,  like  the  one  described  in  Section  2.5.  The  particular 
choice  of  components  in  a  component  library  must  reflect  (a)  the  domain  of  inter¬ 
est;  and  (b)  the  most  detailed  level  of  granularity  that  needs  to  be  reasoned  about. 
(Section  2.4.2  shows  how  components  at  a  coarser  level  of  detail  can  be  recognized  au¬ 
tomatically.)  For  example,  the  components  used  to  describe  electronic  devices  would 
differ  from  the  components  used  to  describe  chemical  plants.  Similarly,  electronic 
devices  can  be  described  at  multiple  levels  of  detail,  ranging  from  logic  gates  down 
to  layers  in  semiconductor  wafers. 

Physical  and  structural  properties 

In  addition  to  the  types  of  the  components  in  the  device,  the  structure  of  the  device 
can  also  specify  various  properties  of  these  components.  These  properties  can  be 
broadly  classified  as  physical  and  structural  properties,  and  include  properties  such  as 
shape,  dimensions,  mass,  and  material  composition.  As  with  the  choice  of  component 
types,  the  choice  of  physical  and  structural  properties  of  components  depends  on  the 
domain  and  how  it  is  conceptualized. 

Structural  relations 

Structural  relations  are  relations  between  components  that  describe  how  components 
are  put  together.  The  most  commonly  used  structural  relation  is  the  connected-to 
relation,  that  says  that  two  component  terminals  are  connected  to  each  other  [de  Kleer 
and  Brown,  1984].  Other  structural  relations  that  we  use  include  coiled-around  (in¬ 
dicating  that  a  wire  is  coiled  around  a  component),  meshed  (indicating  that  a  pair  of 
gears  mesh  with  each  other),  and  immersed- in  (indicating  that  a  component  is  im¬ 
mersed  in  a  fluid).  As  with  components  and  their  physical  and  structural  properties. 
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the  set  of  structural  relations  is  crucially  dependent  upon  the  domain,  and  how  we 
choose  to  conceptualize  it. 

Structural  predicates 

Predicates  that  can  be  used  to  describe  the  structure  of  a  device  will  be  called  struc¬ 
tural  predicates.  In  particular,  component  types  will  be  unary  structural  predicates, 
structural  and  physical  properties  of  components  will  be  binary  structural  predicates 
(in  fact,  they  will  be  unary  functions),  and  structural  relations  will  be  general  n-ary 
structural  predicates.  As  discussed  above,  deciding  which  predicates  are  structural 
predicates  is  dependent  upon  the  domain  and  how  it  is  conceptualized. 

Miscellaneous  observations 

The  device  structure  provides  an  important  bias  for  model  selection.  In  particular, 
we  have  seen  in  Chapter  2  that  the  components  specified  in  the  device  structure, 
in  conjunction  with  a  component  library,  defines  the  basic  space  of  possible  device 
models.  The  structural  relations  specified  in  the  device  structure  constrains  the  space 
of  component  interactions.  Hence,  the  bias  provided  by  the  device  structure  aids  the 
search  for  device  models  by  specifying  the  space  of  possible  component  models  and 
the  space  of  possible  component  interactions. 

An  alternate,  though  consistent,  viewpoint  is  as  follows:  the  description  of  device 
structure  is  already  a  model  of  the  device  which  embodies  some  set  of  modeling 
decisions.  Hence,  the  model  selection  algorithms  discussed  in  this  thesis  can  be  viewed 
as  making  additional  modeling  decisions,  given  the  modeling  decisions  made  above.  In 
other  words,  certain  aspects  of  modeling  have  been  automated,  while  other  parts  are 
still  the  purview  of  human  experts.  This  division  of  labor  is  particularly  useful  since 
rudimentary  structural  models  of  devices  are  automatically  available  when  human 
designers  use  CAD  tools. 

Finally,  note  that  the  structural  context  of  a  device  is  not  fixed,  but  can  change, 
even  during  the  normal  operation  of  the  device:  the  components  in  a  device  can  change 
as  new  components  are  created  and  old  ones  are  destroyed  (e.g.,  boiling  water  creates 
steam);  the  physical  and  structural  properties  of  components  can  change  (e.g.,  the 
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magnetic  strip  on  your  credit  card  can  get  demagnetized);  and  the  structural  relations 
between  components  can  change  (e.g.,  the  contact  between  the  hammer  and  the  dome 
of  an  electric  bell  constantly  changes  during  the  normal  operation  of  the  bell). 

3.6.2  Constraints 

Domain-dependent  constraints  that  stem  from  the  structural  context  are  called  struc¬ 
tural  constraints.  Structural  constraints  are  evaluated  with  respect  to  a  structural 
context  and  a  device  model.  Hence,  as  the  structural  context  changes,  different  device 
models  may  be  necessary  to  ensure  that  all  the  structural  constraints  are  satisfied. 
We  distinguish  two  types  of  structural  constraints:  precondU  ions  and  coherence  con¬ 
straints. 

Structural  preconditions 

Structural  preconditions  are  first-order  constraints  associated  with  model  fragment 
classes  which  use  only  structural  predicates.  The  structural  preconditions  associated 
with  a  model  fragment  cla^s  are  constraints  on  the  structural  context  that  must  be 
satisfied  if  a  component  is  to  be  modeled  by  that  model  fragment  class.  For  example, 
assuming  that  composition  and  metal  are  structural  predicates,  the  precondition;® 

(and  (composition  ?object  ?material) 

(metal  ?material)) 

in  the  Electrical-conductor  model  fragment  class  indicates  that  a  component  must 

be  metallic  for  it  to  be  modeled  as  an  Electrical-conductor. 

Structural  preconditions  are  similar  to  process  preconditions  in  QP  theory  [For- 

bus,  1984].  However,  process  preconditions  are  sufficient  conditions,  i.e.,  a  process 

instance  is  created  whenever  the  process  preconditions  are  satisfied.  On  the  other 

hand,  structural  preconditions  are  necessary  conditions.  Hence,  the  above  constraint 

does  not  require  that  every  metallic  object  be  modeled  as  an  Electrical-conductor. 

It  only  says  that  a  component  can  be  modeled  as  an  Electrical-conductor  only 

®Recall  that  the  variable  “?object”  is  bound  to  the  component  of  interest,  i.e.,  to  the  component 
that  we  want  to  model  as  an  instance  of  this  model  fragment  class. 
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if  it  is  metallic.  We  can  express  this  precisely  by  rewriting  the  above  constraint  as 
follows: 

(implies 

(Electricail-conductor  ?object) 

(and  (composition  ?object  ?material) 

(metal  ?material))) 

More  generally,  if  C  is  a  structural  precondition  associated  with  model  fragment  class 
M,  then  this  is  equivalent  to  the  constraint  M(?object)  C. 

Structural  coherence  constraints 

Structural  coherence  constraints  are  additional  first-order  constraints  on  the  model 
fragment  classes  used  to  model  one  or  more  components.  The  predicates  used  in 
structural  coherence  constraints  are  either  structural  predicates  or  model  fragment 
classes  (which  are  unary  predicates).  As  with  structural  preconditions,  each  struc¬ 
tural  coherence  constraint  is  associated  with  a  model  fragment  class,  expressing  the 
constraint  that  a  component  can  be  modeled  by  that  model  fragment  class  only  if  the 
corresponding  constraint  is  satisfied. 

For  example,  the  following  structural  coherence  constraint: 

(implies 

(and  (Wire  ?object) 

(coiled-around  ?object  ?core) 
(magnetic-material  ?core)) 

(Magnet  ?core)) 

associated  with  the  Electromagnet  model  fragment  class,  requires  that  a  wire  coiled 
around  a  core  made  of  magnetic  material  can  be  modeled  as  an  electromagnet  only 
if  the  core  is  modeled  as  a  magnet.  The  justification  for  this  domain  dependent 
constraint  is  that  such  a  core  amplifies  the  wire’s  magnetic  field  by  three  or  four 
orders  of  magnitude,  converting  the  core  into  a  powerful  magnet.  Hence,  under  these 
circumstances,  an  engineer  would  not  consider  the  model  to  be  adequate  unless  the 
core  were  modeled  as  a  magnet. 


70 


CHAPTER  3.  ADEQUATE  MODELS 


Note  that,  like  structural  preconditions,  structural  coherence  constraints  are  also 
associated  with  model  fragment  classes.  Hence,  the  above  constraint  is  more  precisely 
written  as: 


(implies 

(and  (Electromagnet  ?object) 

(Wire  ?object) 

(coiled-around  ?object  ?core) 

(magnet ic-materia?  ?core)) 

(Magnet  ?core)) 

where  we  have  used  the  fact  that  A  (B  ^  C)  is  equivalent  to  {A  A  B)  C). 

In  summary,  an  adequate  model  must  satisfy  all  applicable  structural  constraints: 

•  A  model  fragment  M(c)  can  be  part  of  an  adequate  model  only  if  all  the  struc¬ 
tural  preconditions  and  structural  coherence  constraints  associated  with  model 
fragment  class  M  are  satisfied,  with  the  variable  ?object  bound  to  c. 


3.7  Constraints  from  the  behavioral  context 

In  this  section  we  discuss  the  behavioral  context,  another  important  source  of  con¬ 
straints  on  model  adequacy.  We  will  then  discuss  different  types  of  constraints  that 
stem  from  the  behavioral  context. 

3.7.1  Behavioral  context 

The  behavioral  context  of  a  device  is  its  behavior  at  a  particular  time.  The  behavior 
of  a  device  at  a  particular  time  is  just  the  values,  at  that  time,  of  the  parameters 
that  can  be  used  to  model  the  device.  Note  that  the  behavioral  context  of  a  device 
is  dependent  upon  the  time  at  which  the  behavior  snapshot  is  taken.  Hence,  the 
behavioral  context  changes  with  time,  eis  the  values  of  the  parameters  change.  For 
example,  the  behavioral  context  of  the  temperature  gauge  in  Figure  1.1  would  include 
values  for  the  current  flowing  in  the  circuit,  the  temperature  of  the  bimetallic  strip. 


3. 7.  CONSTRAINTS  FROM  THE  BEHAVIORAL  CONTEXT 


71 


and  the  magnetic  field  generated  by  the  wire.  As  the  values  of  these  parameters 
change,  the  behavioral  context  also  changes. 

Ideally,  we  would  like  the  behavioral  context  to  refer  to  the  actual  behavior  of  the 
device,  e.g.,  the  values  of  the  parameters  are  obtained  by  actual  measurements  on  a 
physical  prototype.  However,  the  actual  behavior  of  a  device  is  usually  unavailable. 
Rather,  the  behavior  must  be  computed  using  the  equations  of  a  device  model.  Hence, 
the  behavioral  context  can  be  computed  only  after  a  device  model  has  been  selected. 
Of  course,  different  device  models  can  predict  different  behaviors,  each  introducing 
different  errors.  Hence,  it  is  essential  that  the  behavior  be  computed  with  a  device 
model  that  introduces  an  acceptably  low  error. 

A  component’s  behavioral  context  can  provide  modeling  information  not  explicitly 
available  in  the  structural  context.  This  is  because  behavior  generation  explicates 
information  that  is  implicit  in  equations.  Consider  modeling  an  air  gap:  if  the  voltage 
drop  across  it  is  large  enough  (as  in  a  properly  functioning  spark  plug),  then  it  should 
be  modeled  as  an  electrical  conductor;  if  the  voltage  drop  across  it  is  not  large  enough 
(as  in  a  common  electrical  switch),  it  should  be  modeled  as  an  electrical  insulator. 
The  value  of  the  voltage  drop  across  the  air  gap  (a  behavioreil  property)  determines 
the  appropriate  model  for  it. 


3.7.2  Constraints 

Domain-dependent  constraints  that  stem  from  the  behavioral  context  are  called  be¬ 
havioral  constraints.  Behavioral  constraints  are  evaluated  with  respect  to  a  behavioral 
context,  a  structural  context,  and  a  device  model.  Hence,  as  the  behavioral  context 
changes  over  time,  different  device  models  may  be  necessary  to  ensure  that  all  the 
behavioral  constraints  are  satisfied  (assuming  that  the  structural  context  remains 
the  same).  As  with  structural  constraints,  we  distinguish  two  types  of  behavioral 
constraints:  preconditions  and  coherence  constraints. 
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Behavioral  preconditions 

Behavioral  preconditions  are  first-order  constraints  associated  with  model  fragment 
classes  which  use  only  structural  predicates  and  order  relations  between  parameter 
values,  i.e.,  they  do  not  use  model  fragment  classes.  The  behavioral  preconditions 
associated  with  a  model  fragment  class  are  constraints  that  must  be  satisfied  if  a 
component  is  to  be  modeled  by  that  model  fragment  class.  For  example,  the  precon¬ 
dition: 

(<  (voltage-difference  Tobject) 

(voltage-difference-threshold  ?object)) 

?.ssociated  with  the  Ideal-conductor  model  fragment  class  indicates  that  a  compo¬ 
nent  can  be  modeled  as  an  Ideal-conductor  only  if  the  voltage  drop  across  it  is 
less  than  some  threshold.  As  with  structural  preconditions,  behavioral  preconditions 
are  necessary  conditions  on  the  use  of  model  fragment  classes.  Hence,  the  above 
constraint  is  more  precisely  written  as: 

(implies 

(Ideal-conductor  Tobject) 

(<  (voltage-difference  ?object) 

(voltage-difference-threshold  ?obj ect ) ) ) 

Behavioral  preconditions  look  superficially  similar  to  quantity  conditions  in  processes 
[Forbus,  1684].  However,  behavioral  preconditions  are  used  to  decide  which  model 
fragment  classes  it,  an  assumption  class  can  be  used  to  model  a  component.  In 
contrast  quantity  conditions  in  processes  only  control  the  activity  of  a  process,  but 
not  the  existence  of  the  preoef.*.,  in  essence,  b^^avioral  preconditions  are  modeling 
constraints,  while  quantity  conditions  ate  about  the  physics  of  the  situation. 

Behavioral  coherence  constraints 

Behavioral  coherence  constraints  are  hevt- order  constraints  on  the  model 

fragment  classes  used  to  model  one  or  mc-c  components.  The  predicates  used  in  be¬ 
havioral  coherence  constraints  are  either  relations  between  parameter  values,  struc¬ 
tural  predicates,  or  model  fragment  classes  (which  are  unary  predicates).  As  with 
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behavioral  preconditions,  each  behavioral  coherence  constraint  is  associated  with  a 
model  fragment  class,  expressing  the  constraint  that  a  component  can  be  modeled  by 
that  model  fragment  class  only  if  the  corresponding  constraint  is  satisfied. 

For  example,  the  following  behavioral  coherence  constraint: 

(implies 

(>=  (*  (voltage-difference  ?object) 

(current  (electrical -terminal-one  ?object))) 
(electrical-power-threshold  ?obj  ect) ) 

(Thermal-resistor  ?object)) 

in  the  Resistor  model  fragment  class  states  that  when  a  component  is  being  modeled 
as  a  resistor,  and  if  the  dissipated  power  exceeds  a  threshold,  then  this  dissipation 
must  be  explicitly  modeled  by  modeling  the  component  as  a  Thermal-resistor. 

Note  that,  like  behavioral  preconditions,  behavioral  coherence  constraints  are  also 
associated  with  model  fragment  classes.  Hence,  the  above  constraint  is  more  precisely 
written  as: 

(implies 

(and  (Resistor  ?object) 

(>=  (♦  (voltage-difference  Tobject) 

(current  (electrical-termineil-one  ?object))) 
(electrical-power-t!;reshold  Tobject)  )  ) 
(Thermal-resistor  ?object)) 

In  summary,  an  adequate  model  must  satisfy  all  applicable  behavioral  constraints: 

•  A  model  fragment  M(c)  can  be  part  of  an  adequate  model  Only  if  all  t  he  behav¬ 
ioral  preconditions  and  behavioral  coherence  constraints  r,.^soaave i  with  model 
fragment  class  M  are  satisfied,  with  the  variable  Tobject  bounh  *0  c.. 

3.7.3  Thresholds  in  behavioral  constrairt^b 

Behavioral  constraints  can  be  viewed  as  deciding  whether  or  not  partio  Jar  phe..'  ,  a 
are  significant,  and  hence  worth  modeling.  Behavioial  -zoncti^n'/.s  decide  on 
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significance  of  phenomena  by  checking  whether  the  values  of  certain  parameters  are 
high  enough  or  low  enough.  Appropriately  set  thresholds  decide  whether  or  not  the 
parameters  values  are  high  enough  or  low  enough. 

For  example,  the  behavioral  precondition  shown  above  says  that  a  component 
can  be  modeled  as  an  Ideal -conductor  only  if  the  voltage-difference  across 
the  component  is  insignificant,  i.e.  small  enough.  This  is  checked  by  comparing 
the  voltage-difference  with  the  voltage-difference-threshold.  Similarly,  the 
behavioral  coherence  constraint  shown  above  requires  a  Resistor  to  be  modeled  as 
a  Thermal-resistor  if  the  heat  generated  in  the  Resistor  is  significant,  i.e.,  large 
enough.  This  is  checked  by  comparing  the  actual  amount  of  heat  generated  against 
the  electrical-power-threshold. 

Since  the  thresholds  determine  the  significance  of  various  phenomena,  different 
threshold  settings  lead  to  models  of  differing  accuracy,  i.e.,  to  models  that  include 
different  sets  of  significant  phenomena.  Thresholds  can  be  either  preset  or  computed 
dynamically.  A  widely  used  preset  threshold  in  the  domain  of  fluid  mechanics  is  a 
threshold  of  2300  for  Reynolds  number,  that  distinguishes  laminar  fluid  flow  from 
turbulent  fluid  flow.  Thresholds  can  also  be  preset  by  an  engineer  from  common 
practice.  For  example,  in  the  domain  of  power  distribution  systems,  where  normal 
voltages  are  in  the  range  of  tens  of  thousands  of  volts,  a  voltage  difference  of  up  to  10 
volts  may  be  considered  insignificant.  On  the  other  hand,  in  the  domain  of  electronic 
circuits,  voltages  of  only  up  to  .01  volts  may  be  considered  insignificant. 

While  thresholds  can  be  preset,  a  more  interesting  and  robust  method  of  setting 
the  thresholds  is  to  set  them  dynamically,  based  on  knowledge  of  acceptable  error 
tolerances  on  some  parameters.  These  error  tolerances  can  be  propagated  to  set 
other  thresholds.  This  propagation  can  be  done  using  either  propagation  rules,  or  the 
equations  of  a  device  model.  In  this  thesis,  we  do  not  explore  this  interesting  line  of 
work  any  further.  See  [Shirley  and  Falkenhainer,  1990;  Nayak,  1991]  for  some  initial 
work  in  this  area. 
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3.8  Simplicity  of  models 

Thus  far,  we  have  said  that  an  adequate  model  must  be  consistent  and  complete,  must 
be  able  to  explain  the  expected  behavior,  and  must  satisfy  all  the  domain- dependent 
constraints  stemming  from  the  structural  and  behavioral  contexts.  Typically  a  very 
large  number  of  device  models  satisfy  these  criteria.  Most  of  these  models  introduce 
irrelevant  detail  into  the  causal  explanations  they  generate,  either  by  modeling  irrel¬ 
evant  phenomena,  or  by  including  needlessly  complex  models  of  relevant  phenomena. 

For  example,  assume  that  the  model  in  Figure  3.4  satisfies  all  the  above  criteria. 
Other  models  that  augment  this  model  by  modeling  additional  phenomena,  such  as 
the  electromagnetic  field  generated  by  the  wire,  would  also  satisfy  the  above  criteria. 
Similarly,  models  that  use  more  accurate  descriptions  of  phenomena  that  are  already 
modeled,  e.g.,  by  modeling  the  wire  as  a  temperature  dependent  resistor  rather  than  a 
constant  resistance  resistor,  would  also  satisfy  the  above  criteria.  Such  models  intro¬ 
duce  irrelevant  detail  into  the  causal  explanation  of  how  the  thermistor’s  temperature 
affects  the  pointer’s  angular  position. 

To  address  this  problem  we  need  a  simplicity  ordering  on  the  models.  Given  such 
a  simplicity  ordering,  we  will  say  that  an  adequate  model  is  a  simplest  model  that 
satisfies  all  the  above  criteria,  i.e.,  no  simpler  model  satisfies  the  above  criteria.  The 
simplicity  ordering  we  consider  is  a  partial  ordering  of  the  models,  and  is  based  on 
the  approximation  relation  between  model  fragments.  This  definition  of  simplicity 
is  based  on  the  following  two  intuitions:  (a)  a  model  is  simpler  if  it  models  fewer 
phenomena;  and  (b)  approximate  descriptions  are  simpler  than  more  accurate  ones. 

Definition  3.8  (Simplicity  of  models)  A  model  M2  is  simpler  than  a  model  Mi 
(written  M2  <  Mi)  if  for  each  model  fragment  m2  €  M2  either  (a)  m2  €  Mi;  or 
(h)  there  is  a  model  fragment  mi  €  Mi  such  that  m2  is  an  approximation  of  mi, 
i.e.,  approximation{mi,m2).  M2  is  strictly  simpler  than  Mi  (written  M2  <  Mi)  if 
M2  <  Ml  and  Mi  ^  M2 . 

For  example,  a  model  simpler  than  the  one  shown  in  Figure  3.4  is  one  that  re¬ 
moves  the  model  fragment  Thermal-resistor(wire-l).  A  more  complex  model 
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resuHs  from  replacing  the  model  fragment  Constant-resistcince (wire-1)  by  the 
model  fragment  Temperature-dependent-resistor(wire-l).  A  model  that  is  in¬ 
comparable  to  the  one  in  Figure  3.4  is  the  one  in  which  we  both  remove  the  model 
fragment  Thernial-resistor(wire-l)  and  replace  Constant-resistance(wire-l) 
by  Temperature-dependent-resistor  (wire-1) . 

It  is  important  to  note  that  this  definition  of  model  simplicity  is  based  purely 
on  the  intuitions  mentioned  above.  In  particular,  the  definition  does  not  guarantee 
that  a  simpler  model  is  more  efficient.  Nor  does  it  guarantee  that  simpler  models 
lead  to  simpler  causal  explanations  of  the  expected  behavior.  However,  while  there 
are  no  such  guarantees,  we  believe  that  the  above  definition  of  simplicity  provides  a 
good  heuristic  for  identifying  more  efficient  models,  and  for  generating  simpler  causal 
explanations.  In  particular,  it  is  common  engineering  practice  to  simplify  models  by 
disregarding  irrelevant  phenomena  and  by  using  all  applicable  approximations.  In 
addition,  in  Chapter  5  we  shall  introduce  a  special  class  of  approximations,  called 
causal  approximations,  which  will  ensure  that  the  above  definition  of  simplicity  will, 
in  fact,  lead  to  simpler  causal  explanations. 

We  will  require  that  adequate  models  be  as  simple  as  possible,  provided  the  rest 
of  the  criteria  discussed  in  this  chapter  are  satisfied: 

•  An  adequate  model  is  a  simplest  model  that  meets  all  the  criteria  discussed  in 
this  chapter. 


3.9  Summary 

The  adequacy  of  models  is  closely  linked  to  the  task  for  which  the  model  is  to  be 
used.  In  this  thesis,  we  consider  the  adequacy  of  models  with  respect  to  the  task 
of  generating  causal  explanations  for  phenomena  of  interest.  Causal  explanations 
play  an  important  role  in  reasoning  about  physical  systems,  not  only  as  a  vehicle 
for  communicating  with  human  users,  but  also  to  focus  other  tasks  such  as  diagnosis, 
design,  and  simulation.  A  widely  used  claiss  of  causal  explanations  are  based  on  causal 
dependencies  between  parameters.  These  causal  dependencies  between  parameters, 
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also  called  the  causal  ordering  of  the  parameters,  are  derived  from  the  equations 
comprising  a  device  model. 

Our  definition  of  model  adequacy  is  based  on  the  following  inputs: 

1.  The  component  library,  described  in  Chapter  2,  which  is  a  description  of  the 
components,  their  possible  models,  and  various  relations  between  the  possible 
models. 

2.  The  expected  behavior,  which  is  the  phenomenon  for  which  the  causal  explana¬ 
tion  is  desired.  The  expected  behavior  is  represented  as  a  query,  causes{pi,p2), 
requesting  a  causal  explanation  for  how  one  parameter,  p2,  causally  depends  on 
another,  pi . 

3.  The  structural  context,  which  includes  the  different  aspects  of  the  structure  of 
the  device.  The  structural  context  defines  the  basic  space  of  possible  device 
models. 

4.  The  behavioral  context,  which  includes  the  values  of  parameters  that  can  be 
used  to  model  the  device. 

5.  The  structural  constraints,  which  are  a  set  of  domain-dependent  constraints 
that  can  be  evaluated  using  the  structural  context  and  the  device  model. 

6.  The  behavioral  constraints,  which  are  a  set  of  domain-dependent  constraints 
that  can  be  evaluated  using  the  behavioral  context,  the  structural  context,  and 
the  device  model. 

Given  the  above  set  of  inputs,  the  adequacy  of  a  device  model  is  defined  as  follows: 

1.  An  adequate  model  must  be  consistent,  i.e.,  its  equations  must  not  be  overde¬ 
termined  and  it  must  not  include  contradictory  model  fragments. 

2.  An  adequate  model  must  be  complete,  i.e.,  its  equations  must  be  complete  and 
it  must  include  model  fragments  from  every  required  assumption  class. 
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3.  An  adequate  model  must  be  able  to  explain  the  expected  behavior,  i.e.,  the 
causal  ordering  generated  from  the  model’s  equations  must  subsume  the  causal 
dependency  for  which  an  explanation  is  requested. 

4.  An  adequate  model  must  satisfy  all  domain-dependent  structural  and  behavioral 
constraints. 

5.  An  adequate  model  is  a  simplest  model  that  satisfies  the  above  four  conditions. 


Chapter  4 


Complexity  of  model  selection 


In  this  chapter  we  analyze  the  complexity  of  the  problem  of  finding  adequate  device 
models,  in  particular,  we  will  show  that  this  problem  is  NP-hard.  We  will  provide 
three  different  proofs  of  this  result,  with  each  proof  being  based  on  a  special  case  of 
the  general  problem.  These  special  cases  help  to  identify  three  different  sources  of  the 
intractability  of  the  problem  of  finding  adequate  device  models.  Informally,  the  three 
sources  of  intractability  are:  (a)  deciding  what  phenomena  to  model;  (b)  deciding 
how  to  model  the  selected  phenomena;  and  (c)  ensuring  that  all  domain-dependent 
constraints  are  satisfied. 

In  Section  4.1  we  present  a  formalization  of  the  problem  of  finding  adequate  mod¬ 
els.  In  particular,  we  show  how  the  elements  of  this  formalization  are  derived  from  the 
inputs  to  the  model  selection  problem  discussed  in  the  previous  chapter.  Section  4.2 
contains  the  complexity  analysis  of  the  different  special  cases  of  the  general  problem 
of  finding  adequate  models.  In  Section  4.3  we  briefly  discuss  the  complexity  of  some 
related  problems.  In  particular,  we  show  that  the  problem  of  finding  just  a  consis¬ 
tent  and  complete  model  is  intractable,  and  that  finding  adequate  models  remains 
intractable  even  if  each  equation  can  have  exactly  one  causal  orientation 
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4.1  Formalizing  the  problem 

In  this  section  we  develop  a  formal  statement  of  the  problem  of  finding  an  adequate 
device  model.  We  start  by  formalizing  the  input  to  the  model  selection  problem,  and 
then  give  a  formal  statement  of  the  problem. 

4.1.1  Formalizing  the  input 

In  the  previous  chapter  we  saw  that  the  inputs  to  our  definition  of  model  adequacy  are 
the  following:  the  component  library,  the  expected  behavior,  the  structural  context, 
the  behavioral  context,  the  structural  constraints,  and  the  behavic^aJ  constraints. 
The  formaliz  ation  we  develop  here  will  include  representations  of  the  first  two  and 
the  last  two  of  these  inputs.  However,  the  formalization  will  not  include  an  explicit 
representation  of  the  structural  and  behavioral  contexts.  Rather,  we  assume  that 
these  are  given,  and  we  use  them  implicitly  in  formeilizing  the  other  inputs.  This 
means  that  the  complexity  results  of  this  chapter,  and  the  algorithms  developed  in 
the  next  chapter,  will  have  nothing  to  say  about  how  the  structural  and  behavioral 
contexts  are  computed.  Chapters  7  and  8  will  discuss  this  issue  in  more  detail. 

We  formalize  the  input  to  the  model  selection  problem  eis  a  tuple  I: 

J  =  (jM,  contradictory,  approximation,  A,  C, p,  q)  (4.1) 

where  M.  is  the  set  of  all  applicable  model  fragments,  contradictory  and  approximation 
are  binary  relations  on  model  fragments  as  discussed  in  Chapter  2,  A  is  the  set  of  all 
applicable  assumption  classes,  C  is  a  set  of  propositional  coherence  constraints,  and  p 
and  q  are  parameters  representing  the  fact  that  causes{p,  q)  is  the  expected  behavior. 
We  now  discuss  each  of  these,  focusing  in  particular  on  how  the  component  library, 
the  structural  constraints,  and  the  behavioral  constraiints  are  translated  into  elements 
of  the  above  tuple.  As  a  typographic  convention,  we  will  typeset  all  elements  of  the 
input  using  typewriter  lont,  and  all  elements  of  our  formalization  using  italics  or 
calligraphic  letters. 
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Propositional  coherence  constraints 

We  start  by  introducing  propositional  coherence  constraints.  A  propositional  coher¬ 
ence  constraint  is  just  a  propositional  formula  in  which  the  propositions  are  model 
fragments.  A  propositional  coherence  constraint  is  satisfied  with  respect  to  a  set  M 
of  model  fragments  just  in  case  the  corresponding  propositional  formula  is  satisfied 
by  the  interpretation  that  assigns  true  to  a  proposition  if  and  only  if  the  proposition 
is  in  M,  and  false  otherwise.  For  example,  the  propositioned  coherence  constraint 

(mi  V  m2)  =>  m3 

is  satisfied  by  the  set  {mi,  m3}  of  model  fragments.  It  is  also  satisfied  by  the  set 
{m2, m3},  and  by  the  empty  set. 

As  a  convenient  shorthand,  we  allow  the  use  of  assumption  cleisses  in  proposi¬ 
tional  coherence  constraints.  Recall  that  an  assumption  class  is  a  set  of  mutually 
contradictory  model  fragments.  Hence,  we  use  an  assumption  class  as  a  shorthand 
for  a  disjunction  of  the  model  fragments  in  the  acsumption  class.  For  example,  if  the 
assumption  class  A  contedns  the  model  fragments  mi  and  m2,  then  the  propositioned 
coherence  constraint 

m3  =>  A 

is  equivalent  to  the  propositional  coherence  constraint 

m3  =>  (mi  V  m2) 

Recall  that  a  model  is  just  a  set  of  model  fragments.  C  is  the  set  of  propositional 
coherence  constraints  that  must  be  satisfied  by  any  adequate  model.  As  we  shall 
see,  the  propositional  coherence  constraiints  in  C  will  be  defined  using  the  structural 
and  behavioral  coherence  constraints,  and  the  required  assumption  classes  of  model 
fragments. 

The  component  library 

We  formalize  the  component  library  as  a  set  A4'  of  all  model  fragments  (Later, 
when  we  discuss  the  structural  and  behavioral  preconditions,  we  will  introduce  the 
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set  M  C  M',  which  is  the  set  of  all  applicable  model  fragments.)  The  model  fragments 
in  M'  are  constructed  from  the  structural  context,  which  specifies  the  components 
used  in  the  device,  and  the  component  library,  which  specifies  the  possible  models  of 
each  component  class.  M'  contains  a  model  fragment  for  each  of  the  possible  ways 
in  which  each  component  of  the  device  can  be  modeled,  i.e.,  if  c-1  is  a  component 
of  the  device,  and  if  c-1  is  an  instance  of  component  class  C,  and  if  M  is  a  model 
fragment  class  that  is  a  possible  model  of  C,  then  M.'  contains  the  model  fragment 
M(c-l).  The  possible  models  of  a  component  class  is  just  the  transitive  closure  of  the 
possible-models  of  the  class. 

M'  =  {M(c-l)  ;  c-1  is  a  component  of  the  device 

A  c-1  is  an  instance  of  component  class  C  (4.2) 
A  M  is  a  possible  model  of  C} 

Note  that  the  device  components  used  in  defining  M'  include  the  structural  abstrac¬ 
tions  discussed  in  Section  2.4.2. 

The  component  library  also  defines  a  number  of  important  relations:  contra¬ 
dictory,  approximation,  as  sumption- class,  required-assumpticn-classes,and 
generalization.  As  discussed  in  Chapter  2,  we  use  contradictory  and  approximation 
to  define  the  contradictory  and  approximation  relations  between  model  fragments, 
respectively.  In  particular,  if  model  fragment  class  Ml  specifies  model  fragment  class 
M2  as  a  contradictory  class,  and  if  c  is  a  component  such  that  Ml  (c)  and  M2(c)  are 
model  fragments  in  M',  then  we  include  the  literal 

contradictory{m  (c),M2(c)) 

in  our  formalization.  Similarly,  if  model  fragment  class  Ml  specifies  model  fragment 
class  M2  as  an  approximation,  and  if  c  is  a  compone~>t  such  that  Ml(c)  and  M2(c) 
are  model  fragments  in  M' ,  then  we  include  the  literal 

approximation{^l  (c) ,  M2  (  c) ) 

in  our  formalization.  The  properties  of  contradictory  and  approximation  are  discussed 
in  detail  in  Chapter  2.  For  convenience,  we  restate  their  most  important  properties 
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here: 


->  contradictory  {nil,  nil ) 

contradictory  {mi,  ni2)  =>•  contradictory  {m2,  mi) 
-<approximation{mi,  mj) 

approximation{mi,m2)  -<approximation{m2,mi) 


(4.3) 

(4.4) 

(4.5) 

(4.6) 


approximation{mi,m2)  A  approximation{m2,mz)  ^  approximation{mi,mz){4.1) 
approximation{mi,m2)  ^  contradictory  {mi,  m2)  (4.8) 


We  use  the  assumption-classes  of  model  fragment  classes  to  define  the  set  A'  of 
all  assumption  classes.  In  particular,  if  c  is  a  component,  and  M  is  a  model  fragment 
class  that  specifies  A  as  its  assumption-class,  and  if  M(c)  is  a  model  fragment  in 
M',  then  we  say  that  A(c)  is  an  assumption  class  in  A'  that  contains  the  model 
fragment  M(c). 

We  formalize  the  required-assumption-classes  of  model  fragment  classes  using 
propositional  coherence  constraints.  In  particular,  if  a  model  fragment  class  M  specifies 
A  as  a  required-assumption-class,  and  if  c  is  a  component  such  that  M(c)  is  a 
model  fragment  in  Ad',  then  we  add  the  propositional  coherence  constraint 


M(c)  ^  A(c) 


to  the  set  C  of  propositional  coherence  constraints.  Since  an  adequate  model  must 
satisfy  each  constraint  in  C,  it  follows  that  every  adequate  model  will  include  a  model 
fragment  irom  each  required  assumption  class. 

Finally,  we  represent  the  generalization  relation  between  model  fragment  classes 
using  propositional  coherence  constraints.  In  particular,  if  a  model  fragment  class  Ml 
is  a  generalization  of  a  model  fragment  class  M2,  and  if  Ml(c)  and  M2(c)  are  model 
fragments  in  A4',  then  we  add  the  propositional  coherence  constraint 


M2(c)  =J‘Ml(c) 

to  the  set  C  of  propositional  coherence  constraints.  This  ensures  that  every  adequate 
model  that  models  c  as  an  instance  of  M2  also  models  it  as  an  instance  of  Ml. 
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Structural  and  behavioral  preconditions 

The  structural  and  behavioral  preconditions  associated  with  a  model  fragment  class 
are  necessary  conditions  for  a  component  to  be  modeled  by  that  class.  Recall  that 
structural  and  behavioral  preconditions  are  constraints  that  use  only  structural  pred¬ 
icates  and  order  relations  between  parameters,  i.e.,  they  do  not  use  model  fragment 
classes.  Hence,  these  preconditions  are  not  evaluated  with  respect  to  a  device  model, 
but  rather  can  be  evaluated  using  only  the  structural  and  behavioral  contexts  of  the 
device. 

We  use  the  structural  and  behavioral  preconditions  to  define  the  set  M  C  M'  of 
applicable  model  fragments,  i.e.,  the  set  of  model  fragments  for  which  the  structural 
and  behavioral  preconditions  are  satisfied.  More  precisely,  let  M  be  a  model  fragment 
class  and  let  c  be  a  component  such  that  M(c)  is  a  model  fragment  in  M'.  M(c)  is 
in  M  if  and  only  if  all  the  structural  and  behavioral  preconditions  associated  with 
model  fragment  class  M  are  satisfied  when  the  variable  “?object”  is  bound  to  c.  For 
example,  the  model  fragment  Electrical-conductor (wire-l)  is  in  M  only  if  the 
structural  precondition 

(and  (composition  ?object  ?material) 

(metal  TmateriaT)) 

is  satisfied  when  “Tobject”  is  bound  to  wire-1.  Recall  that  the  other  variables  in 
this  constraint,  like  “?material,”  are  existentially  quantified. 

Hence,  from  the  structural  and  behavioral  preconditions,  the  structural  and  be¬ 
havioral  contexts,  and  the  the  set  Ai'  of  all  model  fragments,  we  define  the  set  Ai  of 
all  applicable  model  fragments.  Using  M  and  the  set  A'  of  all  assumption  classes,  it 
is  straightforward  to  define  the  set  A  of  all  applicable  2issumption  classes.  Informally, 
A  is  the  set  of  assumption  classes  that  results  from  restricting  the  assumption  classes 
in  A!  to  contain  only  applicable  model  fragments.  More  precisely,  if  A  €  >1'  is  an 
assumption  class,  then  let  applicahle{A')  be  the  maximal  subset  of  A  that  contains 
only  applicable  model  fragments,  i.e.,  model  fragments  from  M.  Hence,  we  have: 


>4  =  {A  :  A  =  applicable{A)  A  A  ^  A'  A  A  ^  0} 


(4.9) 
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Structural  and  behavioral  coherence  constraints 

The  structural  and  behavioral  coherence  constraints  associated  with  model  fragment 
classes  are  constraints  that  use  structural  predicates,  order  relations  between  param¬ 
eters,  and  unary  predicates  representing  model  fragment  classes.  Hence,  these  coher¬ 
ence  constraints  are  evaluated  with  respect  to  the  structural  context,  the  behavioral 
context,  and  a  device  model.  We  can  remove  the  dependence  of  these  coherence  con¬ 
straints  on  the  structural  and  behavioral  contexts  by  converting  each  of  them  into  a 
set  of  propositional  coherence  constraints.  Conceptually,  this  is  achieved  by  instanti¬ 
ating  each  coherence  constraint  in  all  possible  ways  over  the  universe  of  all  objects  in 
the  knowledge  base.^  Each  resulting  instantiated  constraint  can  be  converted  into  a 
propositional  coherence  constraint  by  replacing  each  ground  literal  in  the  constraint 
by  true,  false,  or  a  model  fragment,  according  to  the  following  rules: 

1.  If  the  literal  involves  a  structural  predicate,  use  the  structural  context  to  decide 
whether  the  literal  is  true  or  false. 

2.  If  the  literal  is  an  order  relation  between  parameters,  use  the  behavioral  context 
to  decide  whether  the  literal  is  true  or  false. 

3.  If  the  literal  involves  a  unary  predicate  representing  a  model  fragment  cla^s, 
then  check  whether  or  not  the  corresponding  model  fragment  is  in  Af.  (The 
model  fragment  corresponding  to  the  ground  literal  (M  c)  is,  of  course,  M(c).) 
If  the  corresponding  model  fragment  is  in  Af,  replace  the  literal  by  the  model 
fragment,  else  replace  the  literal  by  false. 

Using  the  above  procedure,  each  instantiated  coherence  constraint  can  be  converted 
into  a  propositional  coherence  constraint.  All  such  propositional  coherence  con¬ 
straints,  except  the  ones  that  are  vacuously  true,  are  added  to  C  as  constraints  that 
must  be  satisfied  by  any  adequate  model 

For  example,  consider  the  following  structural  coherence  constraint: 

*The  universe  of  all  objects  in  the  knowledge  base  would  include,  among  others,  the  components 
in  the  device,  the  components  terminals,  and  the  parameters. 

there  are  any  vacuously  false  propositional  coherence  constraints  then  that  means  that  the 
corresponding  coherence  constraint  can  never  be  satisfied,  and  hence  there  is  no  adequate  model. 
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(implies 

(zmd  (Electromagnet  ?object) 

(Wire  ?object) 

(coiled- around  ?object  ?core) 
(Magnetic-material  ?core)) 

(Magnet  ?core)) 

One  way  to  instantiate  '  ne  above  constraint  is  to  bind  “?object”  to  wire-1,  and  to 
bind  “?core”  to  bms-1  to  get  the  following  ground  constraint: 

(implies 

(and  (Electromagnet  wire-1) 

(Wire  wire-1) 

(coiled-around  wire-1  bms-1) 
(Magnetic-material  bms-1)) 

(Magnet  bms-1)) 

To  convert  the  above  ground  constraint  into  a  propositional  coherence  constraint, 
let  us  assume  that  the  structural  context  says  that  wire-1  is  a  Wire,  and  that  it 
is  coiled-around  bms-1,  which  is  made  of  Magnetic-materied.  Hence,  the  above 
constraint  reduces  to  the  following  propositional  coherence  constraint: 

Electromagnet  (wire-1)  =^>-  Magnet  (bms-1) 

On  the  other  hand,  if  we  bind  “?core”  to  ptr-1,  then  we  get  the  following  ground 
constraint: 

(implies 

(amd  (Electromagnet  wire-1) 

(Wire  wire-1) 

(coiled-around  wire-1  ptr-1) 
(Magnetic-material  ptr-1)) 

(Magnet  ptr-1)) 

Since  wire-1  is  not  coiled-around  ptr-1,  the  third  conjunct  in  the  antecedent  of 
the  above  constraint  gets  replaced  by  false,  and  hence  the  propositional  coherence 
constraint  corresponding  to  the  above  ground  constraint  is  vacuously  true. 
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In  summary,  given  the  structural  and  behavioral  contexts,  the  structural  and 
behavioral  coherence  constraints  can  be  converted  into  a  set  of  propositional  coherence 
constraints.  Note  that  the  above  discussion  does  not  imply  that  it  is  a  good  idea  to 
convert  all  the  structural  and  behavioral  coherence  constraints  into  propositional 
coherence  constraints,  or  that  the  above  is  the  best  way  to  do  it.  The  point  of 
the  discussion  is  to  show  that,  given  the  structural  and  behavioral  contexts,  the 
structural  and  behavioral  coherence  constraints  can  be  viewed  as  a  set  of  propositional 
coherence  constraints.  This  will  simplify  the  complexity  analysis  of  this  chapter,  and 
the  development  of  efficient  algorithms  in  the  next  chapter. 

4.1.2  Problem  statement 

Given  the  formalization  of  the  input  to  the  problem  of  finding  an  adequate  model  as 
the  following  tuple: 

I  =  (M,  contradictory,  approximation,  A,C,p,  q) 

we  are  in  a  position  to  give  a  precise  statement  of  the  problem  itself.  Before  we  do 
this  we  define  three  irhportant  types  of  models:  coherent  models,  causal  models,  and 
adequate  models. 

Coherent,  causal,  and  adequate  models 

Recall  that  a  model  is  a  set  of  model  fragments.  We  will  require  that  the  model 
fragments  in  a  model  must  be  in  M,  i.e.,  we  will  only  consider  models  consisting  of 
applicable  model  fragments.  A  coherent  model  is  a  complete,  consistent  model,  that 
satisfies  all  the  propositional  coherence  constraints  in  C: 

Definition  4.1  (Coherent  models)  A  model  M  C  M  is  said  to  be  a  coherent 
model  if  and  only  if  the  following  conditions  are  satisfied: 

1.  M  contains  no  mutually  contradictory  model  fragments. 

2.  The  equations  of  M  are  complete  (Definition  S.f). 
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3.  All  the  constraints  in  C  are  satisfied  by  M . 

Conditions  1  and  2  together  ensure  that  coherent  models  are  consistent  (Defini¬ 
tion  3.6),  since  if  the  equations  of  M  are  complete  then  the  equations  are  not  over¬ 
constrained.  Conditions  2  and  3  together  ensure  that  coherent  models  are  complete 
(Definition  3.7),  since  C  contains  constraints  that  ensure  that  coherent  models  contain 
model  fragments  from  all  required  assumption  classes. 

A  causal  model  is  a  coherent  model  that  also  explains  the  expected  behavior. 

Definition  4.2  (Causal  model)  A  model  M  C  A4  is  a  causal  model,  with  respect 
to  the  expected  behavior  causes(p,q),  if  and  only  if  (a)  M  is  a  coherent  model;  and 
(b)  q  causally  depends  on  p  in  the  causal  ordering  generated  from  the  equations  of  M, 
t.e.,  {p,q)eC{EiM)). 

Finally,  an  adequate  model  is  just  a  minimal  causal  model. 

Definition  4.3  (Adequate  model)  A  model  M  C  Ad  is  an  adequate  model  if  and 
only  if  M  is  a  causal  model  and  no  coherent  model  strictly  simpler  than  M  is  a  causal 
model,  t.e.,  for  all  coherent  models  M',  such  that  M'  <  M,  M'  is  not  a  causal  model. 
(Model  simplicity  is  defined  as  in  Definition  3.8.) 

The  minimal  causal  model  problem 

We  now  give  a  formal  statement  of  the  problem  of  finding  an  adequate  model.  We 
call  this  problem  the  MINIMAL  CAUSAL  MODEL  problem. 

Definition  4.4  (MINIMAL  CAUSAL  MODEL)  Let  the  input  to  the  problem  of  finding 
an  adequate  model  be  the  tuple  I: 

I  =  (Ad,  contradictory,  approximation,  A,C,p,  q) 

where  the  elements  of  the  tuple  are  as  in  Equation  4  L  Find  an  adequate  model  with 
respect  to  X,  t.e.,  find  a  minimal,  causal  model  with  respect  to  X. 


4.2.  COMPLEXITY  ANALYSIS 


80 


To  help  in  analyzing  the  complexity  of  the  MINIMAL  CAUSAL  MODEL  problem,  we 
introduce  the  CAUSAL  MODEL  problem,  which  is  the  decision  problem  corresponding 
to  the  MINIMAL  CAUSAL  MODEL  problem.  The  CAUSAL  MODEL  problem  asks  whether 
or  not  there  exists  a  causal  model,  without  requiring  this  causal  model  to  be  minimal. 


Definition  4.5  (-  USAL  MODEL)  Let  the  input  to  the  problem  of  finding  an  ade¬ 
quate  model  be  the  ..'.'.pie  I : 

T.  =  (A4,  contradictory,  approximation,  A,  C,p,q) 

where  the  elements  of  the  tuple  are  as  in  Equation  j^.l.  Does  there  exist  a  causal 
model  with  respect  to  29 


4.2  Complexity  analysis 

In  this  section  we  analyze  the  complexity  of  the  CAUSAL  MODEL  problem  and  the 
MINIMAL  CAUSAL  MODEL  problem.  In  particular,  we  will  show  that  the  CAUSAL 
MODEL  problem  is  NP-complete.  An  immediate  corollary  of  this  is  that  the  MINI.MAL 
CAUSAL  MODEL  problem  is  NP-hard.  Since  it  is  strongly  believed  that  P  ^  NP, 
these  results  imply  that,  in  general,  the  problem  of  finding  adequate  device  models 
is  intractable,  i.e.,  there  is  no  polynomial  time  algorithm  for  finding  adequate  device 
models. 

We  prove  that  the  CAUSAL  MODEL  problem  is  NP-complete  by  first  showing  that 
it  is  in  NP,  and  then  showing  that  three  of  its  special  cases  are  NP-hard.  The  three 
special  cases  will  identify  three  sources  for  the  intractability  of  the  CAUSAL  MODEL 
problem.  Informally,  the  three  sources  are;  (a)  deciding  what  phenomena  to  model, 
i.e.,  deciding  which  assumption  classes  to  use;  (b)  deciding  how  to  model  the  chosen 
phenomena,  i.e.,  selecting  model  fragments  from  chosen  assumption  classes;  and  (c) 
ensuring  that  causal  models  satisfy  all  the  propositional  coherence  constraints.  In 
the  next  chapter,  we  will  use  this  knowledge  to  design  special  cases  of  the  MINIMAL 
CAUSAL  MODEL  problem  that  can  be  solved  in  polynomial  time. 
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4.2.1  Problem  size 

Before  we  start  the  complexii,  analysis,  we  define  the  size  of  the  input  to  the  CAUSAL 
MODEL  and  MINIMAL  CAUSAL  MODEL  problems.  The  input  to  these  problems  is  as 
defined  in  Equation  4.1,  reproduced  here  for  ease  of  reference: 

2  =  (M,  contradictory,  approximation,  A,  C,p,q) 

We  define  the  size  of  the  input  as  the  sum  of: 

1.  \M.\,  the  number  of  model  fragments  in  M.\ 

2.  \C\,  the  number  of  constraints  in  C; 

3.  \E{AA)\,  the  number  of  equations  in  the  model  fragments  in  A4;  cind 

4.  \P{M)\,  the  number  of  parameters  used  in  the  equations  of  the  model  fragments 
in  M. 

It  is  easy  to  see  that  the  amount  of  space  occupied  by  any  reasonable  encoding  of  J 
must  be  a  polynomial  function  of  the  size  of  In  particulau-,  the  number  of  tuples  in 
the  contradictory  and  approximation  relations  is  bounded  by  a  quadratic  function  of 
\M\,  and  the  number  of  assumption  classes  in  A  is  bounded  by  \M\.  The  complexity 
analyses  in  this  chapter  and  the  next  chapter  are  with  respect  to  the  above  definition 
of  the  size  of  a  problem  instance.  In  particular,  the  phrase  “runs  in  polynomial  time” 
will  often  be  used  to  mean  “runs  in  time  polynomial  in  the  size  of  J,”  where  the 
instance  2  will  be  clear  from  the  context. 

4.2.2  Preliminaries 

We  start  the  analysis  by  showing  that  the  CAUSAL  MODEL  problem  is  in  NP. 
Lemma  4.1  The  CAUSAL  MODEL  problem  is  in  NP. 


^Note  that  we  have  made  the  (reasonable)  assumption  that  the  amount  of  space  used  in  encoding 
each  equation  is  bounded  by  a  polynomial  function  of  the  number  of  parameters  used  in  the  equation. 
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Proof:  To  show  that  the  CAUSAL  MODEL  problem  is  in  NP,  we  need  to  show  that  a 
nondeterministic  algorithm  can  find  a  causal  model  in  (nondetermici-’stic)  polynomial 
time.  Since  there  are  a  finite  number  of  models,  each  of  which  can  be  generated  in 
polynomial  time,  it  suffices  to  show  that  checking  whether  or  not  a  model  is  a  causal 
model  can  be  done  in  time  polynomial  in  the  size  of  J. 

Given  Af  C  jVf,  it  is  easy  to  cneck  in  polynomial  time  whether  or  not  M  contains 
mutually  contradictory  model  fragments,  and  whether  or  not  M  satisfies  all  the  con¬ 
straints  in  C.  From  the  algorithms  given  in  the  previous  chapter,  it  is  also  possible  to 
check  in  polynomial  whether  or  not  the  equations  of  M  are  complete,  and  whether 
or  not  q  causally  depends  on  p  in  the  causal  ordering  generated  from  the  equations 
of  M.  Hence,  the  CAUSAL  MODEL  problem  is  in  NP.  □ 

We  now  show  that  the  CAUSAL  MODEL  problem  is  NP-hard.  We  will  give  three 
different  proofs  of  this  result.  In  each  proof,  we  will  introduce  a  subclass  of  the 
instances  of  the  CAUSAL  MODEL  problem,  and  show  that  even  if  we  restrict  ourselves 
to  solving  just  the  problem  instances  in  that  subclass,  the  CAUSAL  MODEL  problem 
is  NP-hard.  This  will  allow  us  to  identify  three  different  sources  of  intractability. 
The  NP-hardness  of  the  general  CAUSAL  MODEL  problem  is,  of  course,  an  immediate 
consequence  of  the  NP-hardness  of  any  of  the  three  subclasses. 

In  each  of  the  subclasses  of  the  CAUSAL  MODEL  problem  we  will  restrict  the 
contradictory  relation  to  be  a  relation  that  partitions  the  set  of  model  fragments  into 
the  set  of  assumption  classes,  i.e.,  two  model  fragments  are  in  the  same  assumption 
class  if  and  only  if  they  are  mutually  contradictory: 

(Vmi,m2  €  M)  mi  ^  m2  {contradictory {mi, m2)  =  (3.4  €  -4)  mi, m2  €  A) 

(4.10) 

A  consequence  of  the  above  restriction  is  that  we  can  conceptually  view  the  problem 
of  finding  a  causal  model  as  one  involving  the  following  two  steps:  (a)  selecting  a  set 
of  assumption  classes;  and  (b)  selecting  a  single  model  fragment  from  each  selected 
assumption  class.  Intuitively,  this  corresponds  to  deciding  which  phenomena  to  model 
(step  (a)),  and  then  deciding  how  to  model  the  chosen  phenomena  (step  (b)). 
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4.2.3  The  select  model  fragments  problem 

The  first  special  case  of  the  CAUSAL  MODEL  problem  consists  of  those  instances  of 
the  problem  that  satisfy  the  following  two  conditions:  (a)  the  instance  hcis  no  propo¬ 
sitional  coherence  constraints;  and  (b)  every  causal  model  of  the  instance  includes 
a  model  fragment  from  each  assumption  class.  Hence,  this  special  case  allows  us  to 
identify  the  first  source  of  intractability:  choosing  a  model  fragment  from  each  as¬ 
sumption  class  in  a  set  of  selected  assumption  classes  is  intractable.  More  abstractly, 
even  if  we  knew  exactly  which  phenomena  we  wanted  to  model,  deciding  how  to  model 
the  chosen  phenomena  is  intractable. 

Definition  4.6  (SELECT  MODEL  FRAGMENTS)  This  problem  is  the  special  case  of 
the  CAUSAL  MODEL  problem  which  includes  exactly  those  instances  of  the  CAUSAL 
MODEL  problem  in  which  (a)  the  contradictory  relation  partitions  the  set  M  of  model 
fragments  into  the  set  A  of  assumption  classes;  (b)  C  -  0;  and  (c)  every  causal  model 
of  the  instance  includes  a  model  fragment  from  each  assumption  class,  i.e.,  ifM  C  M 
is  a  causal  model  and  y4  6  .4  zs  an  assumption  class,  then  M  fl  /I  ^  0. 

We  now  show  that  the  above  special  case  is  NP-hard.  The  proof  of  this  lemma  is 
based  on  a  reduction  from  the  ONE-IN-THREE  3SAT  problem,  a  variation  of  the  more 
common  3SAT  problem  in  which  an  acceptable  truth  ^lssignment  must  satisfy  exactly 
one  literal  in  each  clause.  Briefly,  the  reduction  introduces  a  model  fragment  for  each 
literal  in  an  instance  of  ONE-IN-THREE  3SAT,  with  model  fragments  corresponding 
to  complementary  literals  being  placed  in  the  same  assumption  class.  The  mapping 
between  truth  assignments  and  models  is  straightforward:  a  literal  is  true  if  and  only 
if  the  corresponding  model  fragment  is  in  the  model.  Equations  are  assigned  to  model 
fragments  to  ensure  that  a  model  is  a  causal  model  if  and  only  if  the  corresponding 
truth  assignment  assigns  exactly  one  true  literal  to  each  clause. 

Lemma  4.2  The  SELECT  MODEL  FRAGMENTS  problem  is  NP-hard. 

Proof:  To  show  that  the  SELECT  MODEL  FRAGMENTS  problem  is  NP-hard,  we  reduce 
an  arbitrary  instance  of  the  ONE-IN-THREE  3SAT  problem  to  an  instance  of  the  SELECT 
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MODEL  FRAGMENTS  problem.  An  instance  of  the  ONE-IN-THREE  3SAT  problem  is 
defined  as  follows: 

Definition  4.7  (ONE-IN-THREE  3SAT)  Let  U  =  be  a  set  of  n  boolean 

variables,  and  C  =  a  set  of  m  clauses  over  U,  such  that  each  clause 

Ci  e  C,l  <i  <  m,  has  |c,|  =  3.  Is  there  a  truth  assignment  for  U  such  that  each 
clause  in  C  has  exactly  one  true  literal? 

The  ONE-IN-THREE  3SAT  problem  is  shown  to  be  NP-complete  in  [Schaefer,  1978]. 
We  now  reduce  an  arbitrary  instance 


Ii={U,C) 

of  the  ONE-IN-THREE  3SAT  problem  to  an  instance 

I2  =  {M,  contradictory,  approximation,  A,  ili,p,q) 

of  the  SELECT  MODEL  FRAGMENTS  problem  as  follows. 

Introduce  a  model  fragment  m/  for  each  literal  I  in  Ti,  and  a  model  fragment  m: 

M  =  {rUu,  :  1  <  t  <  n}  U  :  1  <  z  <  n}  U  {m} 

Let  mi  and  mj  be  contradictory,  where  /  and  /  are  complementary  literals: 

contradictory{m^,-,mui),  for  1  <  z  <  n 

Note  that  contradictory  partitions  M.  into  a  set  of  mutually  consistent  assumption 
classes,  with  m  being  in  its  own  assumption  class.  This  defines  A,  the  set  of  assump¬ 
tion  classes: 

A  =  :  1  <  z  <  n}  U  {{m}} 

Let  approximation  be  the  empty  relation,  so  that  no  model  fragment  is  an  approxi¬ 
mation  of  any  other  model  fragment,  let  C  =  0. 

Introduce  the  set  "P  of  (m  -f  n  -|-  3)  parameters: 


'P  —  {To,  Pi,  -  •  -  ,  Pm-t-n-t-2 } 
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We  let  p  =  Po  and  q  =  Pm+n+2-  Next,  we  introduce  the  set  E  of  (3m  +  2n  +  3) 
equations:^ 

E=(  U  £;,.)U(  U  F,)VG 

l<j<m 

where  Ej  contains  an  equation  for  each  literal  in  clause  Cj,  Fi  contains  an  equation 
for  literals  u,-  and  u,-,  and  G  contains  three  equations,  as  follows: 

Ej  =  {tji  :  /  is  a  literal  in  clause  Cj} 

Fi  = 

^  =  {91^92,93} 

The  parameters  of  the  equations  in  E  are  defined  as  follows: 

{Fj,  Pj+i }  If  e  €  -Ej,  1  <  ;■  <  m 

{■fm+t',  Pm+t+i }  If  e  €  Fi,  1  <  i  <  n 

If  e  €  E,  then  P{e)  =  -  {Po}  If  c  = 

{Po,Pi}  Ue  =  g2 

.  {Em+n+l  j  E,n-fn+2}  If  C  =  5^3 

For  each  e  €  E,  let  Ee(e)  =  E(e).  The  equations  in  the  model  fragments  of  M  are 
defined  eis  follows: 

E(m/)  =  {eji  :  literal  I  is  in  clause  Cj)  U  {fi} 

F(^)  =  {91,92,93} 

That  completes  the  reduction.  Clearly,  the  reduction  cam  be  done  in  polynomial 
time.  We  now  show  that  I2  is,  indeed,  in  instance  of  the  SELECT  MODEL  FRAGMENTS 
problem.  Since  C  =  0  in  J2,  we  need  only  show  that  every  causal  model  of  I2  contains 
a  model  fragment  from  each  assumption  class. 

Let  M  be  any  causal  model  of  J2.  We  first  show  that  P(M)  =  P.  If  P(M)  ^  P, 
then  there  exists  some  parameter  Pk  E  P,  1  <  <  (m  +  n  +  1),  with  Pk  ^  P{M) 

(Po  and  P rn-^n+2  must,  of  course,  be  in  P{M)).  Since  each  equation  in  E,  except  ^1, 

■^The  equations  that  we  introduce  in  this  proof,  and  in  all  the  other  proofs  in  this  chapter,  will 
not  contain  any  differential  equations.  Hence,  E  =  ic(P). 
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relates  two  parameters  whose  subscripts  differ  by  1,  it  follows  that  no  equation  relates 
parameters  with  subscripts  less  than  k  to  parameters  with  subscripts  greater  than  k. 
Hence,  no  parameter  with  subscript  less  than  k  can  be  related  to  any  parameter  with 
subscript  greater  than  /c,  and  hence  Pq  and  Pm+n+2  are  unrelated,  contradicting  the 
fact  that  M  is  a  causal  model.  Hence  P{M)  =  V. 

Next,  we  show  that  E{M)  contains  exactly  one  equation  from  each  Ej,  1  <  j  <  m, 
exactly  one  equation  from  each  F,,  1  <  t  <  n,  and  the  three  equations  in  G.  First,  we 
show  that  E{M)  must  contain  at  least  one  equation  from  Ej,  1  <  j  <  m.  If  E{M) 
contains  no  equation  from  Ej  for  some  j,  1  <  j  <  m,  then  E{M)  contains  no  equation 
that  relates  a  parameter  with  subscript  less  than  or  equal  to  j  to  a  parameter  with 
subscript  greater  than  or  equal  to  j  + 1.  This  follows  from  the  facts  that  all  equations, 
except  gi,  relate  two  parameters  whose  subscripts  differ  by  1,  and  the  only  equations 
that  relate  Pj  to  Pj+i  are  found  in  Ej.  Hence,  Pq  cind  are  unrelated,  violating 

the  fact  that  M  is  a  causal  model.  Hence,  E{M)  contains  at  least  one  equation  from 
each  Ej,  1  <  j  <  m.  A  similar  argument  shows  that  E{M)  contains  at  least  one 
equation  from  each  Fi,  1  <  i  <n,  and  that  E{M)  must  contain  g2  and  g^  (and  hence 
5i).  Hence  E{M)  contains  at  least  (m  +  n  +  3)  equations.  But  since  M  is  complete, 
\E{M)\  =  \P(M)\  =  (m  +  n  +  3),  and  hence  E{M)  contains  exactly  one  equation 
from  each  Ej,  \  <  j  <  m,  exactly  one  equation  from  each  Fi,  1  <  z  <  n,  tind  the 
three  equations  in  G. 

Recall  that  the  assumption  classes  of  J2  are  the  following: 

U  U  {m} 

l<»<n 

Since  M  contains  the  three  equations  in  G,  M  contains  the  model  fragment  m.  Now 
we  show  that  M  contains  a  model  fragment  from  each  of  the  other  assumption  classes, 
i.e.,  for  each  i,  1  <  i  <  n,  M  contains  le  ol  <  or  171^..  Since  M  is  consistent,  at 
most  one  of  mu,  and  is  in  M.  Since  E(.\'  ontains  an  equation  from  F{,  at  least 
one  of  rriui  is  in  M.  Hence,  exactly  one  of  mu^  and  171^  is  in  M.  Hence,  I2 

is  indeed  an  instance  of  the  SELECT  MODEL  FRAGMENTS  problem. 

We  now  show  that  has  an  acceptable  truth  assignment  if  and  only  if  J2  has  a 
causal  model. 
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(=^)  Suppose  that  Ji  has  an  acceptable  truth  assignment  on  U.  Let  M  be  the 
following  model: 


^  =  {»72u,  :  Ui  is  true}  U  {m„-  :  m  is  false}  U  {m} 

We  claim  that  M  is  a  causal  model.  First,  M  contains  no  mutually  contradictory 
model  fragments,  because  if  M  did  contain  mutually  contradictory  model  fragments, 
then  for  some  i,  1  <  i  <  n,  we  have  €  M.  But  this  means  that  u.-  is  both 

true  and  false,  which  is  impossible. 

show  that  the  set  of  equations  of  M  is  complete,  it  suffices  to  show,  by 
Lemma  3.2,  that  there  is  an  onto  causal  mapping  from  E{M)  to  P{M).  We  start 
by  claiming  that  E{M)  contains  exactly  (m  +  n  +  3)  equations,  one  from  each 
—  j  ^  one  from  each  Fi,!  <  i  <  n,  and  the  three  equations  in  G.  Since 
m  €  M,  it  follows  that  gi,g2,g3  €  E{M).  Since  it,-,  1  <  i  <  n,  is  either  true  or  false, 
M  contains  exactly  one  of  m„.  or  mu,,  and  hence  E{M)  contains  exactly  one  of  /„, 
or  Hence,  E{M)  contains  exactly  one  equation  from  each  Fi,  l<i<n.  Finally, 
let  Ij  be  the  single  true  literal  in  clause  cj,  1  <j  <m.  This  means  that  m/^  €  M  and 
hence  ejij  €  E{M).  Note  that  if  /'•  is  a  literal  in  Cj  that  is  not  true,  it  follows  that 
m/'  ^  M  and  hence  ^  E{M).  Hence,  E{M)  contains  exactly  one  equation  om 
each  Ej,  \  <  j  <m.  Hence,  E{M)  contains  exactly  (m  +  n  +  3)  equations. 

We  now  create  a  1-1  mapping  from  the  (m  +  n  -f  3)  equations  of  E{M)  to  the 
parameters  of  V.  Note  that  if  such  a  mapping  is  possible,  \P{M)\  =  (m  -f  n  -f  3), 
and  hence  P{M)  =  V,  in  which  case  the  mapping  must  be  an  onto  mapping,  and  we 
are  uone.  Map  gi  to  Pq,  g^  to  P\,  and  gz  to  Map  the  representative  of  Ej 

to  Fj+i,  1  <  j  <  m.  Map  the  representative  of  Fi  to  Pm+i+i,  I  <  i  <  n.  It  is  easy 
to  verify  that  this  mapping  is  a  valid  1-1  mapping.  Figure  4.1  shows  this  matching. 
Hence,  M  is  complete. 

Finally,  we  show  that  M  satisfies  the  expected  behavior.  The  mapping  in  Fig¬ 
ure  4.1  shows  that  (aj  Pi  depends  on  Pq  because  Pi  is  matched  to  g2  and  g^  uses 
Fq’i  (b)  for  each  j,  I  <  j  <  m,  P^+i  depends  on  Pj  because  Py+i  is  mapped  to  Cjij 
which  uses  Pj;  (c)  for  each  i,  1  <  i  <  n,  Pm+t+i  depends  on  Pm+i  because  Pm+i+i  is 
mapped  to  /r,  which  uses  Pm+i',  and  (d)  P„,+„+2  depends  on  P,„+„+i  because  P„,+„+2 
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Po  Pi  P2  ...  Pm+l  Pm+2  ...  P w+n+1  Pm+n+2 

ill  II  II 

9l  92  ^m/rn  fry  /r„  ^3 

/j  is  the  t  'ue  literal  in  clause  Cj,  1  <  j  <  m 
r,  is  Uf  if  u,  is  true,  and  u,-  otherwise,  1  <  z  <  n 

Figure  4.1:  Mapping  of  parameters  to  equations 

is  matched  to  ^3  which  uses  Pm+n+i .  We  can  put  all  these  dependency  links  together 
to  infer  that  Pm+n+2  (=  9)  causally  depends  on  Pq  (=  p),  and  hence  M  satisfies  the 
expected  behavior.  Hence,  we  have  proved  that  M  is  d  causal  model. 

(-4=)  We  now  prove  that  if  a  causal  model  exists  in  J2,  then  there  is  an  acceptable 
truth  assignment  in  Xi.  Let  M  be  any  causal  model  of  X2.  We  use  M  to  construct 
an  acceptable  truth  assignment  for  Ii  as  follows:  for  each  u,-  €  ?7, 1  <  z  <  n,  let  u,-  be 
true  if  rriui  €  M ,  and  u,-  be  false  if  m^i  €  M.  This  truth  assignment  assigns  a  unique 
truth  value  to  each  u,  €  f/,  1  <  z  <  n,  since  we  saw  earlier  that  M  contains  exactly 
one  of  rriui  - 

Next,  we  prove  that  each  clause  Cj  €  C,  1  <  j  <  m,  has  exactly  one  true  literal. 
To  prove  this,  we  prove  that  a  literal  I  in  clause  Cj  is  true  if  and  only  if  equation 
tji  €  E{M).  If  tji  €  E{M),  it  follows  that  m;  €  Af,  and  hence  I  is  true.  On  the 
other  hand,  if  I  is  true,  then  m/  €  M,  and  hence  tji  €  E{M).  However,  we  know  that 
E{M)  contains  exactly  one  equation  from  Ej^  and  hence  exactly  one  literal  in  Cj  is 
true.  Hence,  the  truth  assignment  is  acceptable. 

Hence,  we  have  shown  that  Zj  has  an  acceptable  truth  assignment  if  and  only  if 
I2  has  a  causal  model.  Hence,  the  SELECT  MODEL  FRAGMENTS  problem  is  NP-hard. 
□ 

4.2.4  The  SELECT  ASSUMPTION  CLASSES  problem 

The  second  special  case  of  the  CAUSAL  MODEL  problem  consists  of  those  instances 
of  the  problem  that  satisfy  the  following  two  conditions:  (a)  the  instance  still  has  no 
propositional  coherence  constraints;  and  (b)  each  assumption  class  has  exactly  one 
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model  fragment.  Since  each  assumption  class  contains  exactly  one  model  fragment, 
any  causal  model  can  be  viewed  as  merely  selecting  a  set  of  assumption  classes.  Hence, 
this  special  case  identifies  the  second  source  of  intractability:  deciding  which  assump¬ 
tion  classes  to  select  is  intractable.  More  abstractly,  deciding  which  phenomena  we 
want  to  model  is  itself  intractable. 

Definition  4.8  (SELECT  ASSUMPTION  CLASSES)  This  problem  is  the  special  case  of 
the  CAUSAL  MODEL  problem  which  includes  exactly  those  instances  of  the  CAUSAL 
MODEL  problem  in  which  (a)  the  contradictory  relation  partitions  the  set  AA.  of  model 
fragments  into  the  set  A  of  assumption  classes;  (b)  C  =  0;  and  (c)  every  assumption 
class  in  A  contains  exactly  one  model  fragment,  i.e.,  the  contradictory  relation  is  the 
empty  relation. 

We  now  show  that  the  SELECT  ASSUMPTION  CLASSES  problem  is  NP-hard.  The 
proof  is  a  minor  variation  of  the  proof  of  Lemma  4.2. 

Lemma  4.3  The  SELECT  ASSUMPTION  CLASSES  problem  is  NP-hard. 

Proof:  The  proof  of  this  lemma  is  a  minor  variation  of  the  proof  of  Lemma  4.2.  In 
the  proof  of  Lemma  4.2,  we  reduced  an  arbitrary  instance  Ji  of  the  ONE-IN-THREE 
3SAT  problem  to  an  instance  Jj  of  the  SELECT  MODEL  FRAGMENTS  problem.  Here 
we  reduce  Jj  to  an  instance  of  the  SELECT  ASSUMPTION  CLASSES  problem.  is 
the  same  as  X2,  except  for  the  following  differences. 

In  J2,  model  fragments  corresponding  to  complementary  literals  were  made  mutu¬ 
ally  contradictory .  In  contrast,  in  X^,  we  make  the  contradictory  relation  the  empty 
relation,  i.e.,  there  are  no  mutually  contradictory  model  fragments.  Hence,  each 
assumption  class  in  Xj  contains  exactly  one  model  fragment. 

The  second  difference  is  that,  in  X^,  we  add  an  equation  to  each  model  fragment 
as  follows.  In  particular,  we  introduce  n  new  parameters: 

2  {9l>  92?  •  ■  •  5  9n} 

and  2n  new  equations: 


H  —  {^1,  ^2,  .  .  .  ,  U  {^1,  h^,  .  .  .  ,  hn} 
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The  parameters  of  the  equations  in  H  axe  defined  as  follows: 

P{hi)  =  Pc{hi)  =  P{hi)  =  Pcihi)  =  {9,},  for  1  <  i  <  n 

i.e.,  both  h{  and  hi  use  only  the  parameter  9,. 

We  add  the  equation  hi  to  model  fragment  . ,  and  the  equation  hi  to  model 
fragment  ma-,  for  1  <  z  <  n.  As  a  consequence,  no  consistent  model  of  2^  can 
include  both  mu-  and  171^.  This  is  because  the  equations  of  a  model  that  includes 
both  mu.  and  would  include  equations  hi  and  A,-,  and  the  equations  would  be 
overconstrained  (see  Definition  3.4  with  S  =  {hi,  hi}.) 

Hence,  effectively,  model  fragments  m„.  and  behave  as  though  they  were 
contradictory.  Hence,  J2  and  have  the  same  causal  models.  Hence,  li  has  an 
acceptable  truth  assignment  if  and  only  if  has  a  causal  model.  Hence,  the  SELECT 
ASSUMPTION  CLASSES  problem  is  NP-hard.  □ 

Another  view  of  the  results  of  Lemmas  4.2  and  4.3  is  that  the  fundamental  source 
of  intractability  is  that  a  causal  model  must  choose  at  most  one  of  the  model  fragments 
and  mu*  :  in  Lemma  4.2,  the  choice  is  enforced  by  making  m,^^  and  m„-i  mutually 
contradictory,  in  Lemma  4.3,  the  choice  is  enforced  by  assigning  equations  to  m„^  and 
mtr-  such  that  a  model  that  contains  both  of  them  becomes  overconstrained.  This 
suggests  that  other  ways  of  enforcing  such  a  choice  would  also  lead  to  intractability. 
In  particular,  if  C  contained  constraints  of  the  form 

-■mu;  V  --mtf;  (4.11) 

then  any  coherent  model  would  have  to  choose  at  most  one  of  mu,  and  ma ,  leading 
to  intractability.  Hence,  allowing  propositional  coherence  constraints  of  the  above 
form  (also  called  negative  clauses)  is  yet  another  source  of  intractability.  In  the  next 
section,  we  show  that  other  very  simple  types  of  propositional  coherence  constraints 
caxi  lead  to  intractability. 

4.2.5  The  SATISFY  CONSTRAINTS  problem 

The  third  special  case  of  the  CAUSAL  MODEL  problem  consists  of  those  instances  of 
the  problem  that  satisfy  the  following  two  conditions:  (a)  as  in  the  first  case,  every 
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causal  model  of  the  instance  includes  a  model  fragment  from  each  assumption  class; 
(b)  model  fragments  in  the  same  assumption  class  have  the  same  sets  of  equations; 
and  (c)  C  contains  only  definite  horn  clauses  (a  definite  horn  clause  is  a  disjunction 
of  literals  with  exactly  one  positive  literal).  Conditions  (a)  and  (b)  ensure  that,  if  C 
were  empty,  then  finding  a  causal  model  would  be  trivial:  a  causal  model  exists  if  and 
only  if  selecting  an  arbitrary  model  fragment  from  each  assumption  class  leads  to  a 
causal  model.  Hence,  the  intractability  of  this  problem  stems  from  the  causal  model 
having  to  satisfy  the  constraints  in  C,  even  when  the  constraints  are  restricted  to  be 
definite  horn  clauses. 

Definition  4.9  (SATISFY  CONSTRAINTS)  This  problem  is  the  special  case  of  the 
CAUSAL  MODEL  problem  which  includes  exactly  those  instances  of  the  CAUSAL  MODEL 
problem  in  which  (a)  the  contradictor.'  relation  partitions  the  set  M  of  model  frag¬ 
ments  into  the  set  A  of  assumption  classes;  (b)  every  causal  model  of  the  instance 
includes  a  model  fragment  from  each  assumption  class;  (c)  model  fragments  in  the 
same  assumption  class  have  the  same  sets  of  equations;  and  (d)  C  contains  only  def¬ 
inite  horn  clauses. 

We  now  show  that  the  SATISFY  CONSTRAINTS  problem  is  NP-hard. 

Lemma  4.4  The  SATISFY  CONSTRAINTS  problem  is  NP-hard. 

Proof:  Once  again  the  proof  is  based  on  a  reduction  from  the  ONE-IN-THREE  35AT 
problem,  i.e.,  we  will  reduce  an  arbitrary  instance 

Ti={U,C) 

of  the  ONE-IN-THREE  3SAT  problem  (see  Definition  4.7)  to  an  instance 

J2  =  {M,  contradictory,  approximation,  A, C,p,q) 

of  the  SATISFY  CONSTRAINTS  problem.  Introduce  a  model  fragment  m/  for  each 
literal  /  in  Jj: 

M  =  {rUu,  :  1  <  i  <  n}  U  {mf,  :  1  <  f  <  n} 
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Let  mi  and  mj  be  contradictory,  where  I  and  J  are  complementary  literals: 

contradictory 1  <i  <n 
Note  that  contradictory  partitions  M  into  a  set  A  of  assumption  classes: 

A  =  :  1  <  i  <  n} 

Let  approximation  be  the  empty  relation.  Introduce  the  set  V  oi  (n  -f  1)  parameters: 

^={Po,Pl,...,Pn} 

Let  p  =  Pq,  and  q  =  P^.  Next,  introduce  the  set  £  of  n  -f  1  equations: 

E  —  {cQj  • • • 5  ^n} 

The  parameters  of  these  equations  are  defined  as  follows: 

J"(eo)  =  {Po} 

P(e,)  =  Pcici)  =  {P,_i,  J°}  for  1  <  z  <  n 

i.e.,  each  equation  (except  eo)  relates  a  pmr  of  consecutively  numbered  parameters. 
Assign  the  equations  to  the  model  fragments  as  follows: 


P(muj )  —  )  —  {co,  Cj } 

E{mu,)  =  E{m,i-)  =  {e,}  for  2  <  z  <  n 

Note  that  model  fragments  in  the  same  assumption  class  are  assigned  the  same  set  of 
equations.  Finally,  we  introduce  the  set  C  of  3m  propositional  coherence  constraints. 
C  will  contain  3  constraints  from  each  clause  in  C.  Let  us  assume  that  the  three 
literals  in  clause  Cj  are  named  Iji,  1,2,  and  Ijz,  1  <  j  <  m.  The  3m  constraints  in  C 
are  defined  as  follows: 

C  =  U  {  ^ 

Am,;3)  =  m/.j, 

(m,-3  A  m,-J  =  m,3,  } 


(4.12) 
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where  the  literal  /  is  the  complement  of  the  literal  /,  i.e.,  if  /  is  it,-  then  I  is  Ui  and  vice 
versa.  Note  that  a  constraint  of  the  form 


(mi  A  m2)  =  m3 


is  equivalent  to  the  following  three  definite  horn  clauses: 


-■mi  V  ->m2  V  m3 

->1713  V  mi 

-'m3  V  1712 


That  completes  the  reduction.  Clearly,  the  reduction  can  be  done  in  polynomicJ  time. 

We  now  show  that  I2  is,  in  fact,  an  instance  of  the  SATISFY  CONSTRAINTS  prob¬ 
lem.  Conditions  (a),  (c),  and  (d)  in  Definition  4.9  are  straightforward  to  verify.  Hence, 
we  only  need  to  show  that  every  causal  model  of  I2  contains  a  model  fragment  from 
each  assumption  class  in  A.  Let  A/  C  jM  be  a  causal  model  I2. 

First,  we  show  that  P{M)  =  V,  i.e.,  the  equations  of  M  contain  all  the  parameters 
in  V.  Since  M  is  a  causal  model,  P{M)  must  contain  Pq  (=  p)  and  (=  q).  Suppose 
P{M)  does  not  contain  Pi,  0  <  i  <  n.  Since  equations  in  the  model  fragments  of  A4 
relate  only  consecutively  numbered  parameters  (except  cq  which  contains  only  one 
parameter),  it  follows  that  no  equation  relates  parameters  numbered  less  than  i  to 
parameters  numbered  greater  than  i.  Hence,  Pq  and  are  unrelated,  and  hence  M 
is  not  a  causal  model.  Hence,  P{M)  contains  all  the  parameters  in  V. 

Since  P{M)  =  V,  it  follows  that  \P{M)\  =  (n  4  1).  Since  M  is  complete,  it 
follows  that  |jF(M)|  =  (n  -f- 1).  However,  note  that  each  model  fragment  has  exactly 
one  equation,  except  m„j  and  m„-,  which  have  two.  Since  m^,  and  m„-,  are  in  the 
same  assumption  class,  it  follows  that  the  only  way  E{M)  can  have  (n  -f- 1)  equations 
is  if  M  contains  a  model  fragment  from  each  assumption  class.  Hence,  every  causal 
model  of  J2  contains  a  model  fragment  from  each  eissumption  class.  Hence,  I2  is  an 
instance  of  the  SATISFY  CONSTRAINTS  problem  problem. 

Now  we  show  that  Jj  has  an  acceptable  truth  eissignment  if  and  only  if  J2  has  a 
causal  model. 
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(=^)  Suppose  that  Ji  has  an  acceptable  truth  aissignment  on  U.  Let  M  be  the 
following  model: 

M  =  {m^i  :  U{  is  true)  U  {m^;  :  u,  is  false] 

We  claim  that  M  is  a  causal  model.  First,  M  contains  no  mutually  contradictory 
model  fragments,  because  if  M  did  contain  mutually  contradictory  model  fragments, 
then  for  some  i,  1  <  i  <  n,  we  have  m„,.,mu(  €  M.  But  this  means  that  u,-  is  both 
true  and  false,  which  is  impossible. 

Next,  we  show  that  M  contains  a  model  fragment  from  each  assumption  class. 
This  is  a  direct  consequence  of  the  fact  that  each  u,',  1  ^  i  ^  n,  is  either  true  or  false. 
Hence,  either  or  m,r,  is  in  M.  Hence,  M  contains  a  model  fragment  from  each 
assumption  class. 

Since  M  contains  a  model  fragment  from  each  assumption  class,  it  follows  that 
E{M)  =  E.  It  is  easy  to  verify  that  £  is  a  complete  set  of  equations,  and  that 
causally  depends  on  Pq  in  the  causal  ordering  generated  from  E. 

Finally,  we  show  that  all  the  constraints  in  C  are  satisfied.  Since,  for  any  literal  /, 
mi  and  mj  are  contradictory,  it  is  easy  to  see  that  the  constraints  in  C  are  satisfied  by 
M  if  only  if  M  contains  exactly  one  model  fragment  from  each  of  the  following  sets: 

<j<m 

where  Iji,  lj2,  and  Ij^  are  the  three  literals  in  clause  Cj.  For  example,  if  M  does  not 
contain  mi-,  and  mi-,,  for  some  1  <  j  <  m,  then  the  constraint 

(m,-  A  m,-J  =  m,^3 

is  satisfied  if  and  only  if  m/y,  €  M.  Similarly,  if  M  contains  m/y,  then  the  above 
constraint  is  satisfied  if  and  only  M  does  not  contain  m/y,  aond  m/yj.  Since  we  started 
with  an  acceptable  truth  assignment,  it  follows  that  exactly  one  of  the  literals  in 

for  each  1  <  i  <  m,  is  true.  Hence,  for  each  1  <  j  <  m,  exactly  one  model  fragment 
in 


{mi.„mi^„mi^,} 
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is  in  M .  Hence,  all  the  constraints  in  C  are  satisfied.  Hence,  M  is  a  causal  model. 

(■<t=)  We  now  prove  that  if  1 2  has  a  causal  model,  then  there  is  an  acceptable  truth 
assignment  for  Ji.  Let  M  be  a  causal  model  of  I2.  We  use  M  to  construct  a  truth 
assignment  for  Xi  as  follows:  for  each  u,’  6  f/,  1  ^  i  n,  let  Uj  be  true  if  m^^■  €  A/, 

and  Ui  be  false  if  m„-.  €  M.  We  now  prove  that  this  is  an  acceptable  truth  assignment 
for  Ji . 

First,  we  note  that  the  truth  assignment  assigns  exactly  one  truth  value  to  every 
variable  in  U .  This  is  a  direct  consequence  of  the  fact  that  M  contains  a  model 
fragment  from  each  assumption  class  in  A.  (shown  earlier).  Hence,  M  contains  exactly 
one  of  rriuj  and  ,  for  every  1  <  i  <  n.  Hence,  u,-  is  assigned  either  true  or  false  for 
each  u,  €  U. 

Next,  we  show  that  the  truth  assignment  is  such  that  each  clause  cj  €  C  has 
exactly  one  true  literal.  Let  cj  =  {Iji,  lj2,  /js},  1  <  j  <  m.  We  have  already  seen  that 
the  constraints  in  C  have  been  constructed  to  ensure  that  every  causal  model  selects 
exactly  one  model  fragment  from  the  set 

for  each  1  <  j  <  m.  Hence,  the  corresponding  truth  assignment  assigns  true  to 
exactly  one  literal  in  each 

for  1  <  j  <  m.  Hence,  the  truth  assignment  is  an  acceptable  truth  assignment. 
Hence,  if  I2  contains  a  causal  model,  Xi  contains  an  acceptable  truth  assignment. 

Hence,  Ji  contains  an  acceptable  truth  assignment  if  and  only  if  I2  contains  a 
causal  model.  Hence,  the  SATISFY  CONSTRAINTS  problem  is  NP-hard.  □ 

4.2.6  The  intractability  of  finding  causal  models 

An  immediate  consequence  of  the  above  three  lemmas  is  the  intractability  of  the 
CAUSAL  MODEL  problem. 


Theorem  4.1  The  CAUSAL  MODEL  problem  is  NP-complete. 
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Proof:  Since  the  SELECT  MODEL  FRAGMENTS  problem,  the  SELECT  ASSUMPTION 
CLASSES  problem,  and  the  SATISFY  CONSTRAINTS  problem  are  all  special  cases  of 
the  CAUSAL  MODEL  problem,  an  Immediate  corollary  of  any  one  of  the  above  three 
lemmas  (Lemmas  4.2,  4.3,  4.4)  is  that  the  CAUSAL  MODEL  problem  is  NP-hard.  In 
conjunction  with  Lemma  4.1,  we  can  immediately  infer  that  the  CAUSAL  MODEL 
problem  is  NP-complete.  □ 

An  immediate  consequence  of  the  above  theorem  is  that  finding  an  adequate  model 
is  NP-hard. 

Theorem  4.2  The  MINIMAL  CAUSAL  MODEL  problem  is  NP-hard. 

Proof:  Since  the  set  of  all  models  is  finite,  it  follows  that  a  minimal  causal  model 
exists  if  and  only  if  a  causal  model  exists.  Hence,  an  algorithm  for  finding  a  minimal 
causal  model  can  be  used  to  decide  whether  or  not  there  exists  a  causal  model. 
Hence,  the  CAUSAL  MODEL  problem  is  Turing-reducible®  to  the  MINIMAL  CAUSAL 
MODEL  problem.  Since  the  CAUSAL  MODEL  problem  is  NP-complete,  it  follows  that 
the  MINIMAL  CAUSAL  MODEL  problem  is  NP-hard.  □ 

4.3  Other  complexity  results 

In  the  previous  section,  we  investigated  the  complexity  of  finding  causal  models.  In 
this  section  we  will  briefly  investigate  the  complexity  of  two  other  cases.  The  first  case 
is  a  restriction  of  the  CAUSAL  MODEL  problem  in  which  each  equation  has  exactly 
one  causal  orientation.  This  is  an  interesting  case  because  it  is  the  same  restriction 
as  the  one  used  in  QP  Theory  [Forbus,  1984]  and  its  derivatives.  The  second  case  is  a 
variation  of  the  CAUSAL  MODEL  problem  in  which  we  do  not  require  that  models  be 
able  to  explain  the  expected  behavior,  i.e.,  we  look  for  coherent  models,  rather  than 
causal  models.  This  is  interesting  because,  while  we  may  not  always  be  interested  in 
causal  models,  we  will  certainly  insist  that  device  models  be  coherent. 

^Informally,  a  problem  Hi  is  Turing-reducible  to  a  problem  112  if  there  exists  an  algorithm  Ai 
that  solves  IIi  using  a  hypothesized  algorithm  for  solving  112,  such  that  Ai  is  a  polynomial  time 
algorithm  if  and  only  if  A^  is  a  polynomial  time  algorithm  [Garey  and  Johnson,  1979]. 
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We  will  show  that  both  the  above  cases  are  intractable.  Our  proofs  are  based  on 
on  the  proof  of  Lemma  4.2,  but  the  proofs  could  also  have  been  based  on  the  proofs 
of  Lemmas  4.3  and  4.4. 

4.3.1  Fixed  causal  orientations 

We  now  show  that,  even  if  each  equation  is  restricted  to  have  a  single  causal  orienta¬ 
tion,  the  problem  of  finding  a  causal  model  remains  intractable. 

Definition  4.10  (FIXED  ORIENTATION)  This  problem  is  the  special  case  of  the 
CAUSAL  MODEL  problem  in  which  each  equation  has  a  fixed  causal  orientation: 

Vc  €  E(M)  |Pe(e)|  =  1 

Lemma  4.5  The  FIXED  ORIENTATION  problem  is  NP-complete 

Proof:  The  proof  of  this  lemma  is  a  minor  variation  of  the  proof  of  Lemma  4.2.  In 
the  proof  of  Lemma  4.2,  we  reduced  an  arbitrary  instance  of  the  ONE-IN-THREE 
3SAT  problem  to  an  instance  I2  of  the  SELECT  MODEL  FRAGMENTS  problem.  Here 
we  reduce  Jj  to  an  instance  of  the  FIXED  ORIENTATION  problem.  is  the  same 
as  J2,  except  that  we  modify  the  definition  of  Pc  as  follows. 

In  the  proof  to  Lemma  4.2,  we  had  made  Pc{e)  =  P(e),  i.e.,  each  equation  could 
causally  determine  any  parameter  in  that  equation.  Here  we  restrict  Pc{e)  as  follows; 

If  e  6  Ej,  I  <  j  <m 
{Pm+i+i}  If  e  €  Fi,  1  <  1  <  n 
If  eeE,  then  Pde)  =  ■  {Po}  If  e  = 

{Pi}  Ife  =  ^2 

,  {Pm+n+2}  If  e  =  ^3 

i.e.,  each  equation  has  exactly  one  causal  orientation. 

We  now  show  that  X2  and  2^  have  the  same  causal  models.  It  is  easy  to  see  that 
any  causal  model  of  is  also  a  causal  model  of  X2.  Hence,  we  need  only  show  that 
a  causal  model  of  X2  is  also  a  causal  model  of  Zj. 

In  the  proof  of  Lemma  4.2,  we  made  the  following  claim  about  all  the  causal 
models  of  Z2; 
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. . .  and  hence  E{M)  contains  exactly  one  equation  from  each  Ej,l  <j  < 
m,  exactly  one  equation  from  each  Fi,  1  <  i  <  n,  and  the  three  equations 
in  G. 

Furthermore,  when  a  model  is  of  the  above  form,  we  constructed  a  causal  mapping 
as  follows: 


. . .  Map  gi  to  Poj  52  to  Fi ,  and  to  Pm+n+2-  Map  the  representative  of  Ej 
to  ij+i,  1  <  j  <  m.  Map  the  representative  of  Fi  to  1  <  i  <  n. 

It  is  easy  to  see  that  the  above  mapping  can  still  be  done  when  we  restrict  Pc  as 
shown  above.  Hence,  any  causal  model  in  the  original  proof,  remains  a  causal  model 

even  with  the  restriction  on  Pc.  Hence,  every  causal  model  of  Xj  is  a  caused  model  of 

T 

Since  Xj  has  an  acceptable  truth  assignment  i^"  and  only  if  X2  has  a  causal  model, 
it  follows  that  Xi  has  an  acceptable  truth  assignment  if  and  only  if  has  a  causal 
model.  Hence,  the  FIXED  ORIENTATION  problem  is  NP-hard. 

The  FIXED  ORIENTATION  problem  is  in  NP  because  the  CAUSAL  MODEL  problem 
is  in  NP.  Hence,  the  FIXED  ORIENTATION  problem  is  NP-complete.  □ 

The  above  lemma  shows  that  the  problem  of  finding  adequate  device  models  re¬ 
mains  intractable  even  when  each  equation  is  restricted  to  have  a  single  causal  orien¬ 
tation.  Since  the  above  proof  is  based  on  the  proof  of  Lemma  4.2,  it  means  that,  even 
with  the  restriction  on  the  causal  orientations  of  equations,  selecting  model  fragments 
from  selected  assumption  classes  remains  intractable.  A  similar  proof,  based  on  the 
proof  of  Lemma  4.3,  shows  that,  even  with  the  restriction  on  the  causal  orientations 
of  the  equations,  selecting  a  set  of  assumption  classes  remains  intractable. 


4.3.2  Finding  coherent  models 

We  now  show  that  deciding  whether  or  not  there  exists  a  coherent  model,  rather  than 
a  causal  model,  is  also  NP-complete. 
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Definition  4.11  (COHERENT  MODEL)  Let  the  input  to  the  COHERENT  MODEL  prob¬ 
lem  be  the  tuple  I; 


I  =  (Ai,  contradictory,  approximation,  A,  C) 

where  the  elements  of  the  tuple  are  as  in  Equation  4-L  Does  there  exist  a  non-empty 
coherent  model  with  respect  to  T? 

Lemma  4.6  The  COHERENT  MODEL  problem  is  NP-complete. 

Proof:  The  proof  of  this  lemma  is  a  variation  of  the  proof  of  Lemma  4.2.  In  the 
proof  of  Lemma  4.2,  we  reduced  an  arbitrary  instance  Xi  of  the  ONE-IN-THREE  3SAT 
problem  to  an  instance  I2  of  the  SELECT  MODEL  FRAGMENTS  problem.  Here  we 
reduce  Jj  to  an  instance  of  the  COHERENT  MODEL  problem.  Ij  is  the  same  as 
Ti,  except  for  some  modifications.  The  net  result  of  these  modifications  will  be  that 
every  coherent  model  of  will  be  a  causal  model  of  I2,  and  vice  versa.  Hence,  Jj 
will  have  an  acceptable  truth  assignment  if  and  only  if  2^  has  a  coherent  model. 

The  modifications  we  make  are  as  follows.  We  introduce  the  set  Q  of  m  parame¬ 
ters: 

Q  = 

We  also  introduce  the  set  H  of  3m  equations: 

H  =  {hji :  I  is  a,  literal  in  clause  Cj} 

i.e.,  there  is  a  new  equation  corresponding  to  each  literal  in  each  clause.  The  param¬ 
eters  of  the  equations  are  defined  «is  follows: 

P{hji)  =  Pc{hji)  =  {gj},  1  <  i  <  m 

i.e.,  the  equations  corresponding  to  dense  cj  can  determine  qj.  We  assign  these  new 
equations  to  the  model  fragments  in  the  same  way  that  we  assigned  the  equations  in 
E.  In  particular,  we  add  equation  hji  to  model  fragment  m/.  Hence,  the  equations  of 
model  fragment  m/  are: 

E{mi)  =  {cji  :  literal  /  is  in  clause  cj}  U  {//}  U  {hp  :  literal  /  is  in  clause  Cj} 
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That  concludes  the  modifications  to  be  made  to  I2  to  get  Jj.  We  now  show  that 
every  coherent  model  of  2^  is  a  causal  model  of  T2,  and  vice  versa. 

First,  we  show  that  every  coherent  model  of  Tj  contains  at  most  one  equation 
from  Ej,l  <  j  <  m,  and  at  most  one  equation  from  Fi,l  <  i  <  n. 

Because  of  the  equations  that  we  added  to  Jj,  if  literals  and  I2  are  in  the  same 
clause,  say  cy,  then  it  means  that  model  fragments  m/j  and  m/j  cannot  be  used  in  the 
same  consistent  model.  This  follows  from  the  fact  that,  if  they  are  used  in  the  same 
model,  then  the  model’s  equations  would  include  both  hy/,  and  hji^.  But  it  is  easy  to 
verify  that  {/ly/i,  /ly/j}  is  an  overconstrained  set  of  equations.  Hence,  the  model  is  not 
consistent.  An  immediate  consequence  of  this  observation  is  that  a  consistent  model 
can  contain  at  most  one  equation  from  each  Ej,  I  <  j  <  m. 

Similarly,  recall  that  the  two  equations  in  each  Fi,  1  <  i  <  n,  are  assigned  to 
contradictory  model  fragments.  Hence,  it  follows  that  that  any  consistent  model  can 
contain  at  most  one  equation  from  each  Fj,  1  <  i  <  n. 

Now  we  show  that  any  coherent  model  of  must  contain  exactly  one  equation 
from  each  Ej,  1  <  j  <  m,  exactly  one  equation  from  each  Fi,  1  <  i  <  n,  and  the 
equations  gi , §2, 53.  Let  M  be  any  coherent  model  of  Consider  the  following  caused 
mapping  (based  on  the  mapping  shown  in  Figure  4.1).  If  e  E{M),  then  map  5^1  to 
Fb;  If  92  €  E{M),  then  map  92  to  Pi;  If  53  €  E{M),  then  map  to  Pm+n+2;  if  E{M) 
contains  an  equation  from  Ej,  I  <  j  <m,  then  map  that  equation  to  Py+i;  if  E{M) 
contains  an  equation  from  Fi,  I  <i  <n,  then  map  that  equation  to  P^+i+i .  The  last 
two  are  possible  because  we  know  that  any  consistent  model  can  contain  at  most  one 
equation  from  each  Ej  and  at  most  one  equation  from  each  P,-.  If  each  parameter  in 
V  is  matched  to  an  equation  by  the  above  mapping,  then  this  is  the  same  mapping 
as  the  one  shown  in  Figure  4.1,  and  the  coherent  model  is  also  a  causal  model  of  l2- 

If,  on  the  other  hand,  some  parameter  is  not  matched  by  the  above  mapping, 
then  we  show  that  M  is  not  complete,  i.e.,  there  is  a  parameter  in  E{M)  that  is 
not  matched.  Let  P,-  be  the  parameter  with  the  largest  subscript  such  that  P,  is  not 
matched  but  P+i  is  matched.  Such  a  P,-  must  exist,  for  if  Pm+n+2  is  the  unmatched 
parameter  with  the  largest  subscript,  then  Pi  must  also  be  unmatched  (since  92  and 
93  belong  to  the  same  model  fragment).  Hence,  either  none  of  the  parameters  are 
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matched  (in  which  case  the  model  is  empty),  or  there  is  a  parameter  between  Pi  and 
■Pm+n+2  which  is  matched,  and  hence  the  required  Pi  must  exist. 

Given  such  a  P,-,  it  is  easy  to  check  that  must  be  matched  to  an  equation, 
e,  with  parameters  P{e)  =  {P„P.+i}.  Since  e  €  E{M),  it  follows  that  P,-  €  P(M). 
Hence,  since  P,-  is  unmatched,  it  follows  that  EI^M)  is  incomplete.  Hence,  we  have 
shown  that  if  M  is  coherent,  then  all  the  parameters  in  V  are  matched,  and  hence 
the  coherent  model  of  is  also  a  causal  model  of  T2. 

It  is  easy  to  see  that  every  causal  model  of  1 2  is  also  a  coherent  model  of  Ij- 
Hence,  a  model  is  a  causal  model  of  I2  if  and  only  if  it  is  also  a  coherent  model  of  Jj. 
Since  Ji  contains  an  acceptable  truth  assignment  if  and  only  if  I2  contains  a  causal 
model,  it  follows  that  Ii  contains  an  acceptable  truth  assignment  if  and  only  if  Xj 
contains  a  coherent  model.  Hence,  the  COHERENT  MODEL  problem  is  NP-hard. 

The  COHERENT  MODEL  problem  is  clearly  in  NP  (the  proof  is  similar  to  the  proof 
that  the  CAUSAL  MODEL  problem  is  in  NP).  Hence,  it  follows  that  the  COHERENT 
MODEL  problem  is  NP-complete.  □ 

Since  the  above  proof  is  based  on  the  proof  of  Lemma  4.2,  it  identifies  a  source 
of  intractability  of  the  COHERENT  MODEL  problem:  even  if  we  are  interested  only 
in  coherent  models,  choosing  model  fragments  from  selected  assumption  classes  is 
intractable.  Similar  proofs,  based  on  the  proofs  to  Lemmas  4.3  and  4.4,  can  be  used 
to  identify  the  other  sources  of  intractability  of  the  COHERENT  MODEL  problem. 

4.4  Summary 

In  this  chapter  we  analyzed  the  complexity  of  the  problem  of  finding  adequate  de¬ 
vice  models.  We  started  by  developing  a  formal  statement  of  the  problem.  This 
development  showed  how  the  component  library,  consisting  of  model  fragment  classes 
and  first-order  constraints,  can  be  converted  into  a  set  of  model  fragments,  a  set  of 
relations  between  model  fragments,  and  a  set  of  propositional  coherence  constraints. 
This  conversion  is  done  with  the  help  of  the  structural  and  behavioral  contexts. 

We  then  showed  that  the  problem  of  finding  adequate  device  models  is  intractable. 
We  gave  three  different  proofs  for  this  result,  which  helped  us  identify  three  different 
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sources  of  the  intractability.  Informally,  intractability  arises  in  (a)  deciding  what  phe¬ 
nomena  to  model;  (b)  deciding  how  to  model  selected  phenomena;  and  (c)  satisfying 
any  domain-dependent  constraints. 

We  then  showed  that  certain  related  problems  are  also  intractable.  We  first  showed 
that,  even  if  equations  are  restricted  to  have  fixed  causal  orientations,  the  problem  of 
finding  adequate  device  models  remains  intractable.  We  also  showed  that  the  problem 
of  finding  coherent  models,  rather  than  causal  models,  is  also  intractable. 


Chapter  5 

Causal  approximations 


In  the  previous  chapter  we  showed  that  the  problem  of  finding  adequate  device  mo  dels 
(the  MINIMAL  CAUSAL  MODEL  problem)  is  intractable.  This  mecuis  that  there  is  no 
efficient,  polynomial  time  algorithm  for  finding  adequate  models — any  algorithm  for 
finding  such  models  will,  in  the  worst  case,  take  an  exponential  amount  of  time.  To 
put  it  another  way,  any  algorithm  for  finding  adequate  models  will  be  forced  to  search 
a  significant  portion  of  the  exponentially  large  spcice  of  possible  device  models.  Unfor¬ 
tunately,  even  for  fairly  simple  devices,  the  space  of  of  possible  models  is  prohibitively 
large.  Searching  any  significant  portion  of  such  a  huge  space  is  unthinkable. 

However,  the  apparent  intractability  of  finding  adequate  models  seems  to  directly 
contradict  the  informal  observation  that  trained  engineers  axe  remarkably  good  at 
providing  parsimonious  causal  explanations  for  phenomena.  One  way  to  resolve  this 
apparent  contradiction  is  to  assume  that  trained  engineers  are  not  solving  the  general 
MINIMAL  CAUSAL  MODEL  problem.  Ra.ther,  the  problem  instances  that  they  normally 
encounter  are  drawn  from  a  subclass  of  the  MINIMAL  CAUSAL  MODEL  problem  which 
can,  in  fact,  be  solved  efficiently.  In  this  chapter,  we  identify  such  an  efficiently 
solvable  subclass.  We  believe  that  commonly  encountered  instances  of  the  MINIMAL 
CAUSAL  MODEL  problem  are,  in  fact,  drawn  from  this  subclass. 

Since  this  chapter  is  quite  long,  we  give  a  detailed  road  map  of  its  sections.  Sec¬ 
tion  5.1  introduces  the  basic  idea  underlying  the  efficiently  solvable  subclass  of  the 
MINIMAL  CAUSAL  MODEL  problem.  It  introduces  the  upward  failure  property,  and 
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shows  that  if  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  satisfies  the  up¬ 
ward  failure  property,  and  if  the  immediate  simplifications  of  a  coherent  model  can 
be  generated  in  polynomial  time,  then  a  minimal  causal  model  can  be  found  in  polv- 
nomial  time.  Unfortunately,  in  general,  it  is  difficult  to  decide  whether  or  not  the 
upward  failure  property  is  satisfied,  and  whether  or  not  coherent  models  have  a  poly¬ 
nomial  number  of  immediate  simplifications.  Hence,  the  rest  of  the  chapter  focuses 
on  finding  efficient  characterizations  of  these  properties. 

Section  5.2  introduces  a  set  of  preliminary  restrictions  on  the  MINIMAL  CAUSAL 
MODEL  problem.  Sections  5.3  and  5.4  introduce  a  set  of  restrictions  on  the  MINIMAL 
CAUSAL  MODEL  problem  that  ensure  that  the  upward  failure  property  is  satisfied. 
Section  5.3  introduces  a  special  cla.ss  of  approximations,  called  causal  approximations, 
and  shows  that  when  all  the  approximations  are  causal  approximations,  the  causal 
relations  entailed  by  a  model  decrease  monotonically  as  model  fragments  are  replaced 
by  their  approximations.  Section  5.4  generalizes  these  results  to  the  case  in  which 
models  are  also  simplified  by  dropping  model  fragments. 

Section  5.5  and  5.6  focus  on  the  problem  of  efficiently  generating  the  immediate 
simplifications  of  a  coherent  model.  Section  5.5  shows  that  the  model  fragments  of  a 
coherent  model  can  be  approximated  one  at  a  time.  Section  5.6  introduces  a  syntactic 
restriction  on  the  expressive  power  of  the  propositional  coherence  constraints  in  C. 
This  restriction  ensures  that  models  can  be  efficiently  simplified. 

Finally,  Section  5.7  puts  all  the  restrictions  together,  and  presents  an  efficient 
algorithm  for  finding  a  minimal  causal  model.  Throughout  the  chapter,  we  will  also 
discuss  the  reasonableness  of  the  restrictions. 

The  discussion  in  this  chapter  is  restricted  to  models  that  do  not  contain  differ¬ 
ential  equations.  Differential  equations  are  discussed  in  Chapter  6.  In  the  rest  of  this 
chapter,  we  follow  the  terminology  introduced  in  the  previous  chapter  and  let  the 
tuple 

J  =  [M,  contradictory,  approximation,  A,C,p,  q)  (5.1) 

be  an  arbitrary  instance  of  the  MINIMAL  CAUSAL  MODEL  problem,  where  the  elements 
of  the  tuple  are  as  in  Equation  4.1. 
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5.1  Upward  failure  property 

Intuitively,  the  reason  that  the  MINIMAL  CAUSAL  MODEL  problem  is  intractable  seems 
to  be  that  knowing  whether  a  particular  model  is,  or  is  not,  a  causal  model  tells  us 
very  little  about  which  other  models  are,  or  are  not,  causal  models.  This  means  that 
there  is  no  “clever”  way  to  organize  the  search  for  adequate  models,  that  allows  us  to 
rule  out  “large”  parts  of  the  search  space  by  explicitly  checking  only  a  “small”  part 
of  the  search  space.  With  this  intuition  in  mind,  we  introduce  the  upward  failure 
property. 

The  upward  failure  property  is  based  on  the  intuition  that  if  a  model  is  unable 
to  explain  the  phenomenon  of  interest,  there  is  little  reason  to  believe  that  a  simpler 
model  will  be  able  to  explain  that  phenomenon.  We  make  this  precise  with  the 
following  definition,  which  is  similar  in  spirit  to  the  one  given  in  [Weld  and  Addanki, 
1991]: 


Definition  5.1  (Upward  failure  property)  An  instance  I  of  the  MINIMAL  CAUS¬ 
AL  MODEL  problem  is  said  to  satisfy  the  upward  failure  property  if  and  only  if  for  all 
coherent  models  M  Q  Ai,  if  M  is  not  a  causal  model,  then  no  strictly  simpler  model 
is  a  causal  model,  i.e.,  no  model  M'  C  AA  and  M'  <  M  is  a  causal  model. 


In  essence,  the  upward  failure  property  property  says  that  the  simpler  the  model,  the 
less  it  can  explain.  Of  course,  it  is  by  no  means  obvious  that  simpler  models  explain 
fewer  phenomena.  However,  it  does  seem  to  be  standard  engineering  practice  that 
models  that  account  for  more  phenomena  are  more  complex  by  our  definition,  i.e., 
modeling  more  phenomena  more  accurately  leads  to  models  that  can  explain  more. 
This  is,  of  course,  not  an  argument  for  claiming  that  the  upward  failure  property 
is  satisfied  by  all  commonly  encountered  instances  of  the  MINIMAL  CAUSAL  MODEL 
problem.  Rather,  it  merely  provides  a  motivation  for  our  definition  of  the  upward 
failure  property. 
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5.1.1  Efficient  model  selection  algorithm 

Earlier  we  said  that  the  reason  that  the  MINIMAL  CAUSAL  MODEL  problem  is  in¬ 
tractable  seems  to  be  that  knowing  whether  a  particular  model  is,  or  is  not,  a  causal 
model  tells  us  very  little  about  which  other  models  are,  or  are  not,  causal  models. 
The  upward  failure  property  addresses  exactly  this  problem:  knowing  that  a  coherent 
model  is  not  a  causal  model  allows  us  to  rule  out  all  simpler  models  as  possibly  ade¬ 
quate  models.  Hence,  we  can  exploit  this  property  to  develop  an  efficient,  polynomial 
time  algorithm  for  solving  instances  of  the  MINIMAL  CAUSAL  MODEL  problem  that 
satisfy  the  upward  failure  property.  The  algorithm  we  develop  has  two  parts:  (a) 
finding  an  initial  causal  model;  and  fb)  finding  an  adequate  model  by  simplifying  the 
initial  causal  model.  We  start  by  discussing  how  a  causal  model  can  be  simplified, 
ajid  then  discuss  how  we  find  an  initial  causal  model. 

Simplifying  a  model 

A  causal  model  can  be  simplified  to  a  minimal  causal  model  using  the  function  find- 
minimal-causal-model  shown  in  Figure  5.1.  This  function  takes  two  arguments:  (a) 
X,  an  instance  of  MINIMAL  CAUSAL  MODEL;  and  (b)  a  coherent  model  M.  It  returns 
an  adequate  model  (i.e.,  a  minimal  causal  model)  that  is  simpler  than  M.  If  there  is 
more  than  one  such  adequate  model,  it  returns  the  first  one  it  finds.  If  no  such  model 
exists,  it  returns  nil. 

The  simplifications  function,  used  in  find-minimal-causal-model,  when  applied  to 
a  coherent  model  M,  returns  the  set  of  coherent  models  that  are  immediate  simplifi¬ 
cations  of  M.  A  coherent  model  M'  is  an  immediate  simplification  of  M  if  and  only 
if  M'  <  M  and  there  does  not  exist  a  coherent  model  M"  such  that  M'  <  M"  <  M. 

simplifications{M,  I)  = 

{M' :  M'  is  coherent  wrt  J  (5.2) 

AM'  <M 

A  iyM")  M'  <  M"  <  M  M"  is  not  coherent  wrt  X} 

Find-minimal-causal-model{M,2)  works  by  systematically  searching  the  immediate 
simplifications  of  M,  until  it  finds  a  causal  model  M'  such  that  all  the  immediate 
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functi  an  find-minimal-causal-model(M,  X) 

I*  X  ;s  assumed  to  satisfy  the  upward  failure  property  */ 
/*  M  is  assumed  to  be  coherent  */ 
if  M  is  not  a  causal  model  then 
/*  Since  no  simpler  model  can  be  a  causal  model  */ 
return  nil 
else 

for  each  M'  €  simplifications{M ,X)  do 
result  :=  find-miniTnal-cau$al-model{M\X) 
if  result  ^  nil  then 

/*  A  simpler  causal  model  has  been  found  */ 
return  result 
endif 
endfor 

/*  No  simplification  is  a  causal  model,  but  M  is  */ 
return  M 
endif 
end 


Figure  5.1:  Function  find-minimal- causal-mod  el 

simplifications  of  M'  are  not  causal  models.  The  upward  failure  property  then  assures 
us  that  M'  is  a  minima)  causal  model. 

The  following  two  lemmas  establish  the  correctness  and  efficiency  of  this  function. 
The  proofs  will  be  by  induction.  This  is  possible  because  every  recursive  call  made 
by  find-minimal- causal-model{M,X)  replaces  M  by  a  model  that  is  strictly  simpler 
than  M .  Since  there  are  a  finite  number  of  models,  the  recursive  calls  are  guciranteed 
to  bottom  out,  and  hence  we  can  use  induction  in  our  proofs.  We  first  prove  the 
correctness  of  find-minimal-causal-model. 

Lemma  5.1  Let  X  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that 
satisfies  the  upward  failure  property,  and  let  M  C  M  be  a  coherent  model.  Then 
find-minimal-causal-model{M,X)  returns  an  adequate  modet  (i.e.,  a  minimal  causal 
model)  of  X  that  is  simpler  than  M,  if  it  exists,  and  nil  otherwise. 

Proof:  We  prove  this  lemma  by  induction.  There  are  two  base  cases: 
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1.  First,  M  is  not  a  causal  model.  Since,  X  satisfies  the  upward  failure  fi^opij  ty, 
it  follows  that  there  is  no  causal  model  simpler  than  M,  and  hence  no  ad^q’i^te 
model  simpler  than  M.  Since  M  is  not  a  causal  model,  the  condition  of  tpi?  !irst 
if  statement  succeeds,  sxid  find-minimal-causal-model^AI^X)  returns  mi.  Hence, 
the  lemma  is  true  for  this  base  case. 

2.  Second,  AI  is  a  causal  model,  but  it  has  no  immediate  simplifications.  Hence,  M 
is  an  adequate  model.  In  this  case,  find-minimal-causal-model{M ,X)  enters  the 
else  clause,  does  not  enter  the  for  loop,  and  immediately  returns  M .  Hence, 
the  lemma  is  also  true  for  this  base  case. 

Now,  for  the  inductive  step,  assume  that  A/  is  a  causal  model  with  imrmid’ 
simplifications.  There  are  two  cases: 

1.  For  every  M'  €  simplifications{M,I),  find-miniTnal-causal-model{M' ,2)  return:- 
nil.  By  induction,  this  means  that  there  is  no  adequate  model  that  is  simpler 
than  any  of  the  immediate  simplifications  of  M.  But  every  cohe;eut  model  that 
is  strictly  simpler  than  M  is  simpler  than  some  immediate  simplification  of  M. 
Hence,  it  follows  that  there  is  no  adequate  model  that  is  strictly  simpler  than  'M. 
Since  M  is  a  causal  model,  it  follows  that  M  is  an  adequate  model.  When  cv^ery 
M'  C  simplifications{M,I)  is  such  that  find-minimal- causal-model{M'  ,2)  re¬ 
turns  nil,  find-minimal- cans al-mode^AI,  2)  exits  the  for,  and  returns  M.  Hence, 
the  lemma  is  satisfied  for  this  case  of  the  inductive  step. 

2.  M'  €  simplifications{M,2)  is  such  that  find-minimal-causal-mode.l{M' ,2)  re¬ 

turns  a  non-nil  value.  Let  M'  be  the  first  such  model  encountered  in  the 
for  loop.  Hence,  find-minimal-causal-model{M,2)  can  be  seen  to  return  find- 
minimal- causal-model{M\2).  By  induction,  the  value  returned  by  find-mini¬ 
mal- causal-model[M'  ,2)  is  an  adequate  model  that  is  simpler  than  M'.  Since 
^  ^1  follows  that  this  model  is  also  an  adequate  model  simpler  than  M. 

Hence,  the  lemma  is  satisfied  for  this  case  of  the  inductive  step. 

Hence,  if  Af  is  a  coherent  model,  then  find-minimal-causal-model{M,2)  returns 
an  adequate  model  of  2  that  is  simpler  than  M,  if  it  exists,  and  nil  otherwise.  □ 
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Next,  we  show  that  if  the  immediate  simplifications  of  a  coherent  model  can  be 
computed  in  polynomial  time,  then  find-minimal- causal-mo  del  also  runs  in  polynomial 
time. 

Lemma  5.2  Let  1  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem,  and  let 
M  C  M  be  a  coherent  model.  If  the  immediate  simplifications  of  every  coherent  model 
ofX  can  be  computed  m  time  polynomial  in  the  size  ofX,  then  find-minimal-causal- 
model{M,X)  terminates  in  time  polynomial  in  the  size  ofX. 

Proof:  We  prove  this  lemma  b  ’  induction.  There  are  two  base  cases;  (a)  M  is  not  a 
causal  model;  and  (b)  M  is  a  causal  model  with  no  immediate  simplifications.  In  both 
case,  one  can  see  that  the  only  significant  work  that  find-minimal-causal-model[M ,X) 
does  is  to  check  whether  or  not  M  is  a  causal  model.  But  from  Lemma  4.1,  we  know 
that  this  can  be  done  in  polynomial  time,  and  hence  this  lemma  is  true  in  the  base 
cases. 

For  the  inductive  step,  assume  that  M  is  a  causal  model  with  immediate  simpli¬ 
fications.  In  this  case,  find-minimal- cans al-model{M,X)  does  the  following: 

1.  Check  whether  or  not  M  is  a  causal  model.  This  can  be  done  in  polynomial 
time. 

2.  Generates  the  immediate  simplifications  of  M,  and  makes  recursive  calls  to 
find-minimal-causal-model  for  some  or  all  of  the  immediate  simplifications.  By 
assumption,  the  immediate  simplifications  of  M  can  be  generated  in  polyno¬ 
mial  time.  Hence,  it  follows  that  M  has  a  polynomial  number  of  immediate 
simplifications.  By  the  inductive  assumption,  each  of  these  calls  terminates  in 
polynomial  time.  Hence,  the  process  of  generating  the  immediate  simplifications 
and  making  the  recursive  calls  to  find-minimal-causal-model  takes  polynomial 
time. 

Hence,  the  lemma  is  true  for  the  inductive  step,  i.e.,  find-minimal-causal-model[M ,X) 
terminates  in  polynomial  time  even  when  M  is  a  causal  model  with  immediate  sim¬ 
plifications. 
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Hence,  if  is  a  coherent  model,  and  if  the  immediate  simplifications  of  ev¬ 
ery  coherent  model  can  be  generated  in  time  polynomial  in  the  size  of  J,  then 
find-minimal- cans al-modtl{M,  I)  terminates  in  time  polynomial  in  the  size  of  Z.  □ 

Finding  an  initial  causal  model 

Given  any  causal  model  M,  the  find-minimal-causal-model  function  can  be  used  to 
find  a  minimal  causal  model  that  is  simpler  than  M.  Hence,  to  find  a  minimal  causal 
model  of  Z,  we  must  first  find  some  causal  model  of  Z.  In  Chapter  8,  we  will  discuss 
a  heuristic  method  for  finding  such  an  initial  causal  model.  Here,  however,  we  discuss 
an  alternate  method  of  finding  an  initial  causal  model,  that  does  not  rely  on  heuristics. 

Let  us  introduce  a  fictitious  model  Mt,  representing  a  model  that  is  more  accurate 
than  every  model  of  1}  Hence,  the  immediate  simplifications  of  Mt  axe  just  those 
models  of  Z  that  are  not  strictly  simpler  than  any  other  models  of  Z: 

simplifications{MT,T)  = 

{M  :  M  is  coherent  wrt  Z  (5.3) 

A  M'  is  coherent  wrt  Z 

AM  <M'} 

We  can  then  use  find-minimal-causal-model  to  find  an  adequate  model  of  Z  by  as¬ 
suming  that  Mt  is,  by  fiat,  a  causal  model.  In  that  case, 

find-minimai-causal-model{MT,T) 

returns  an  adequate  model  of  Z,  if  it  exists,  or  returns  Mt  if  Z  has  no  causal  models. 

Note  that,  to  find  an  adequate  model  of  Z  in  polynomid  time,  using  the  above 
function  call,  we  will  require  that  simplifications{MT return  in  polynomial  time, 
i.e.,  Z  must  have  a  polynomial  number  of  most  accurate  models,  all  of  which  can  be 
generated  in  polynomial  time.  Hence,  we  have  the  following  theorem: 


*Note  that  Mt  isn’t  really  a  set  of  model  fragments.  It  is  just  a  fictitious  model  that  is  assumed 
to  be  more  accurate  than  every  model. 
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Theorem  5.1  Let  I  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that 
satisfies  the  upward  failure  property.  If  the  most  accurate  models  of  X  can  be  gener¬ 
ated  in  time  polynomial  in  the  size  ofX,  and  if  the  immediate  simplifications  of  any 
coherent  model  of  X  can  be  generated  in  time  polynomial  in  the  size  of  X,  then  an 
adequate  model  of  X  can  be  found  in  time  polynomial  in  the  size  of  X. 

Proof:  Immediate  consequence  of  Lemmas  5.1  and  5.2  and  the  above  discussion.  □ 

5.1.2  Discussion 

We  have  seen  that  the  upward  failure  property  is  useful  because  it  leads  to  an  efficient 
algorithm  for  finding  an  adequate  model.  However,  it  has  a  major  drawback:  it  is  very 
difficult  to  decide  whether  or  not  a  particular  instance  of  the  MINIMAL  CAUSAL  MODEL 
problem  satisfies  the  upward  failure  property.  For  example,  a  straightforward  use  of 
Definition  5.1  requires  us  to  check  every  model  in  the  space  of  possible  models.  Since 
the  space  of  possible  models  is  exponentially  large,  any  such  check  is  unthinkable.  In 
fact,  the  upward  failure  property  was  suggested  as  a  way  around  having  to  check  the 
whole  space  of  possible  models.  Unfortunately,  it  does  not  seem  to  have  succeeded  in 
helping  us  to  circumvent  this  problem. 

This  drawback  of  the  upwcird  failure  property  stems  from  the  fact  that  it  is  a 
global  proper'^”,  i.e.,  a  property  of  the  whole  space  of  possible  models.  What  we  want 
is  a  local  prop^ity  that  entails  the  upward  failure  property,  i.e.,  a  property  of  the 
encoding  of  X  that  can  be  checked  efficiently,  that  will  ensure  that  X  satisfies  the 
upward  failure  property. 

In  the  next  few  sections  we  present  some  local  properties  of  X  that  ensure  that  X 
satisfies  the  upward  failure  property.  In  particular,  we  will  go  back  to  the  sources  of 
intractability  identified  in  the  previous  chapter,  and  place  appropriate  restrictions  on 
X  that  will  make  the  MINIMAL  CAUSAL  MODEL  problem  tractable:  (a)  we  will  intro¬ 
duce  a  new  class  of  approximations,  called  causal  approximations,  that  will  address 
the  problem  of  selecting  model  fragments  from  selected  assumption  classes;  (b)  we 
will  add  additional  constraints  to  C,  called  ownership  constraints,  that  will  address 
the  problem  of  selecting  assumption  classes;  and  (c)  we  will  restrict  the  expressive 
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power  of  constraints  in  C. 


5.2  Preliminary  restrictions 

In  this  section  we  introduce  three  preliminary  restrictions  on  J.  First,  we  assume 
that  the  contradictory  relation  partitions  the  set  of  model  fragments  into  the  set  of 
assumption  classes,  i.e.,  model  fragments  are  in  the  same  assumption  class  if  and  only 
if  they  are  mutually  contradictory  (see  Definition  4.10).  Recall  that  an  assumption 
class  is  a  set  of  different  descriptions  of  the  same  phenomena,  i.e.,  a  set  of  mutually 
contradictory  model  fragments  describing  the  same  phenomena.  Hence,  the  above 
assumption  is  based  on  the  intuition  that  there  is  little  reason  for  descriptions  of 
different  phenomena  to  be  mutually  contradictory. 

Second,  we  assume  that  each  assumption  class  has  a  single,  most  accurate  model 
fragment: 


(V/1  €  .4)(3m  €  €  A)  m  ^  m'  apprcximation[m,m')  (5-4) 

In  other  words,  we  assume  that  each  phenomena  has  a  single  best  description.  This  is 
a  reasonable  assumption  as  long  as  we  only  model  fairly  well  understood  phenomena, 
i.e.,  where  there  is  broad  consensus  amongst  the  domain  experts  about  how  best  to 
model  the  phenomena. 

Note  that  the  above  restriction  appears  to  be  a  problem  when  a  given  phenomena 
can  be  modeled  with  multiple  ontologies.  In  such  cases,  it  may  not  be  possible  to  say 
that  one  ontology  is  more  accurate  than  the  other,  leading  to  multiple  most  accurate 
model  fragments  in  an  assumption  class.  However,  this  does  not  pose  a  problem  if  the 
different  ontologies  are  not  mutually  contradictory,  so  that  model  fragments  that  use 
different  ontologies  are  in  different  assumption  classes.  This  is  often  the  case,  since 
different  ontologies  are  often  used  for  different  purposes. 

For  example,  magnetism  can  be  modeled  either  as  magnetic  fields  [Halliday  and 
Resnick,  1978],  or  as  magnetic  circuits  (Coren,  1989).  However,  these  different  on¬ 
tologies  are  used  for  different  purposes.  In  particular,  the  magnetic  field  ontology 
IS  suitable  for  reasoning  about  the  interactions  of  magnets  with  externally  applied 
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magnetic  fields,  e.g.,  in  generators  and  motors.  On  the  other  hand,  the  magnetic 
circuits  ontology  is  useful  for  recisoning  about  the  interactions  of  magnets  with  mag¬ 
netic  materials,  e.g.,  in  electric  relays  and  door  bells.  Hence,  it  is  perfectly  acceptable 
for  a  model  to  include  both  ontologies.  Hence,  we  can  meet  the  requirement  that 
each  assumption  class  has  a  single  most  accurate  model  fragment  by  placing  model 
fragments  using  the  different  ontologies  in  different  assumption  classes. 

An  important  consequence  of  the  above  restriction  is  that  it  leads  to  I  having 
a  single  most  accurate  model:  the  most  accurate  model  of  I  is  just  the  set  of  most 
accurate  model  fragments  of  the  assunc  otion  classes  of  I.  This  brings  us  to  our 
third  assumption:  we  assume  that  the  most  accurate  model  of  I  is  coherent,  i.e., 
we  assume  that  the  most  accurate  model  is  complete  and  that  it  satisfies  all  the 
domain-dependent  structural  and  behavioral  coherence  constraints. 

The  second  and  third  assumptions  together  imply  that  the  immediate  simplifica¬ 
tions  of  Mt  contains  exactly  one  model:  the  most  accurate  model  of  I.  It  is  easy  to 
see  that  the  most  accurate  model  of  X  can  be  constructed  very  easily  (in  polynomial 
time),  and  hence  the  immediate  simplifications  of  Mt  can  be  computed  in  polynomial 
time. 

In  summary,  the  preliminary  restrictions  that  we  place  on  I  are  the  following: 

•  The  contradictory  relation  partitions  the  set  of  model  fragments  in  M  into  the 
set  A  of  assumption  classes. 

•  Each  assumption  class  in  A  contains  a  single  most  accurate  model  fragment. 

•  The  most  accurate  model  of  X,  which  is  the  set  of  most  accurate  model  fragments 
of  the  assumption  classes  in  A.,  is  coherent. 

5.3  Causal  Approximations 

We  now  introduce  an  important  class  of  approximations,  called  causal  approxima¬ 
tions,  that  are  commonly  found  in  modeling  the  physical  world.  The  basic  idea 
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underlying  the  definition  of  causal  approximations  is  that  more  approximate  descrip¬ 
tions  often  tend  to  involve  fewer  parameters.  Furthermore,  more  accurate  descriptions 
tend  to  explain  more  about  a  phenomenon  than  more  approximate  descriptions. 

For  example,  Figure  2.6  shows  different  descriptions  of  electrical  conduction  in  a 
wire.  Figure  2.7  shows  the  approximation  relation  between  these  descriptions.  Note 
that  the  parameters  in  the  equations  of  the  more  approximate  descriptions  (Ku  =  0 
and  z'tu  =  0)  are  a  subset  of  the  parameters  in  the  equations  of  the  more  accurate 
description  (V^  =  iwRw)-  Furthermore,  only  Resistor(wire-l)  is  able  to  explain 
the  relationship  between  14,,  i^,  and  Ryj. 

In  this  section,  we  make  the  above  idea  precise,  and  investigate  its  consequences. 
As  mentioned  earlier,  we  restrict  our  discussion  to  models  that  do  not  include  differ¬ 
ential  equations.  Differential  equations  will  be  discussed  in  Chapter  6. 

5.3.1  Definitions 

We  start  by  defining  local  parameters.  A  local  parameter  is  a  parameter  that  can 
be  causally  determined  only  by  equations  of  model  fragments  in  a  single  assumption 
class. 

Definition  5.2  (Local  parameters)  A  parameter  p  is  said  to  be  local  to  a  model 
fragment  m  E  M  if  and  only  if  p  can  be  causally  determined  by  the  equations  of  m, 
but  not  by  the  equations  of  any  model  fragment  that  does  not  contradict  m: 

p  €  Pc{m)  A  (Vm'  €  Ad)  m  ^m'  Ap  E  Pd'cn')  contradict  ory{m,m') 

A  parameter  is  said  to  be  shared  if  it  is  not  local  to  any  model  fragment. 

Using  the  above  definition,  we  now  define  causal  approximations.  The  idea  un¬ 
derlying  this  definition  is  that  if  m2  is  a  causal  approximation  of  mj ,  then  any  causal 
orientation  of  the  equations  of  m2  can  be  extended  to  a  causal  orientation  of  the 
equations  of  mi,  such  that  the  latter  causal  orientation  entails  a  superset  of  causal 
relations,  i.e.,  mi  can  explain  more  than  m2: 

Definition  5.3  (Causal  approximations)  A  model  fragment  m2  is  said  to  be  a 
causal  approximation  of  a  model  fragment  mi  if  and  only  if: 


124 


CHAPTERS.  CAUSAL  APPROXIMATIONS 


1.  m2  is  an  approximation  of  mi; 

2.  There  exists  a  1-1  mapping  G  :  m2  -*  mi  such  that  for  each  e  €  m2,  P{e)  C 
P{G{e)),  and  Pc{e)  C  Pc{G{e)).  G  is  called  a  correspondence  mapping,  and  e 
and  G{e)  are  said  to  be  corresponding  equations;  and 

3.  Lit  E*  denote  the  equations  of  mi  that  have  no  corresponding  equations  in  m2, 

and  let  P*  denote  the  set  of  parameters  that  are  local  to  mi,  but  not  local  to 
m2.  Then  there  exists  an  onto  causal  mapping  L  :  E*  P* .  L  is  called  a  local 

causal  mapping  with  respect  to  correspondence  mapping  G. 

Condition  1  ensuies  that  causal  approximations  are  approximations.  Condition  2 
ensures  that  for  any  causal  orientation  of  an  equation  e  €  m2,  there  exists  a  causal 
orientation  of  G{e)  €  mj  which  entails  a  superset  of  caused  relations.  Condition  3 
ensures  that  additional  equations  in  mi  can  be  oriented  to  causally  determine  newly 
introduced  local  parameters. 

Ideal-conductor(wire-l)  :  =  0 

Resistor(wire-l)  :  K,  =  i,j,R^ 

approxtmation(Resistor(wire-l),Ideal-conductor(wire-l)) 

Figure  5.2:  Model  fragments  describing  electrical  conduction  in  wire-1 

For  example,  the  approximation  relation  between  Resistor(wire-l)  and  Ide¬ 
al-conductor  (wire- 1)  shown  in  Figure  5.2  is  a  causal  approximation.  In  particular, 
Vu,  =  0  and  =  i^^R^  are  corresponding  equations  with: 

F(K,  =  0)  C  P{V^  =  i,,R^) 

Pc{V,,  =  0)  C  P,{V^  =  i,,R^) 

Similarly,  the  approximation  relation  between  Temperature-dependent-resis¬ 
tance  (wire-1)  and  Constant-resistance(wire-l)  shown  in  Figure  5.3  is  a  causal 
approximation  if  we  assume  that  Ru,o,a,„,  and  Ty,o  are  local  parameters  of  Tempera¬ 
ture-  dependent -resist  ance (wire- 1 ) . 


5. 3.  CA  USAL  APPROXIMATIONS 


125 


Constant-resistanceCwir  s-l)  ;  exogenous{R^) 
Temperature-dependent-resistaiice(wire-l)  :  +  a^{T^  -  Ty^o)) 

exogenous  (Ryjo) 

exogenous{oLy,) 

exogenous(Ty,o) 

approxzmahon(Temperature-dependeiit-resistance(wire-l), 

Constant -resistance (wire-1)) 

Figure  5.3:  Model  fragments  describing  a  wire’s  resistance. 

In  particular,  exogenous (Ry,)  and  Ry,  =  i?tuo(l  A ciiw{Ty,  —  Tyjo))  axe  corresponding 
equations  with: 

P{exogenous{Ryj))  C  PiRy,  =  Ry^{l  +  ay,{Ty,  -  Ty^o))) 
Pc{exogenous{Ryj))  C  PdRy,  =  Ryjo{l  +  ayj{Ty,  -  T^))) 

and  the  local  causal  mapping,  L,  with  respect  to  this  correspondence  mapping  is: 

L{exogenous{Ry,y))  =  Ry^ 

L{exogenotis{ay;))  =  ay, 

L{exogenous{Ty,o  ))  —  PwO 

One  can  show  that  the  causal  approximation  relation  between  model  fragments  is 
transitive.  Hence,  to  check  that  all  the  approximations  are  causal  approximations,  it 
is  sufficient  to  check  that  the  immediate  approximations  of  each  model  fragment  are 
causal  approximations. 

It  is  worth  noting  that  the  restriction  that  local  parameters  in  a  model  fragment 
cannot  be  causally  determined  by  equations  of  model  fragments  in  other  assumption 
classes  is  not  a  serious  one.  It  is  easy  to  convert  a  local  parameter  into  a  shared 
parameter  by  defining  a  new  assumption  class.  For  example,  to  convert  ay,  (Fig¬ 
ure  5.3)  into  a  shared  parameter,  we  would  (a)  define  a  new  assumption  class  with 
one  model  fragment  m  =  {exogenous{ay,)y,  and  (b)  remove  exogenous{oty,)  from  the 
equations  of  Temperature-dependent-resistance(wire  -l).  After  this  conversion, 
Qiu;  is  not  necessarily  local  to  any  assumption  class,  and  hence  can  be  used  in  multiple 
assumption  classes. 
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5.3.2  Causal  approximations  and  the  upward  failure  prop¬ 
erty 

Causal  approximations  plays  a  key  role  in  ensuring  that  the  upward  failure  property 
is  satisfied.  In  particular,  we  will  show  that  when  all  the  approximations  are  causal 
approximations,  the  causal  relations  entailed  by  a  model  decrease  monotonically  as 
we  simplify  models  without  dropping  assumption  classes.  This  means  that  if  a  model 
does  not  explain  the  expected  behavior,  then  a  simpler  model  that  uses  the  same 
assumption  classes  also  does  not  explain  the  expected  behavior.  It  is  etisy  to  see  that 
this  is  just  a  restricted  version  of  the  upward  failure  property. 

To  prove  the  above  important  result,  we  first  introduce  local  and  global  extensions 
of  causal  mappings.  We  will  then  use  the  properties  of  these  extensions  to  show 
that  a  causal  mapping  of  a  simpler  model  can  be  extended  to  a  causa)  mapping  of  a 
more  accurate  model,  such  that  the  latter  causal  mapping  entails  a  superset  of  causal 
relations. 

Local  extensions 

A  local  extension  of  a  causal  mapping  H2,  defined  on  a  model  fragment  mj,  is  a 
causal  mapping  defined  on  a  more  accurate  model  fragment  mj,  that  orients 
corresponding  equations  in  the  same  way. 

Definition  5.4  (Local  extension)  Let  m],m2  €  M  be  model  fragments  such  that 
1712  is  a  causal  approximation  ofm^.  Let  Hi  :  m-i  P{mi)  and  H2  '■  rn2  P{^2) 
be  causal  mappings,  and  let  G  :  m2  m\  be  a  correspondence  mapping.  Hi  is  said 
to  be  a  local  extension  of  H2  if  and  only  if  for  each  equation  e  €  mi; 

1.  if  G{e')  =  e,  for  some  e'  €  m2,  then  Hi{e)  =  H2{e'); 

2.  otherwise  Hi{e)  is  local  to  mi,  but  not  local  to  m2. 

In  other  words,  Hi  and  H2  orient  corresponding  equations  in  the  same  way,  and  Hi 
orients  the  remaining  equations  in  mi  to  causally  determine  parameters  local  to  mi 
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but  not  in  m2.  An  immediate  consequence  of  this  is  that  the  range  of  ifj  contains 
only  parameters  that  are  either  in  the  range  of  H2  or  that  are  local  to  mj . 

The  following  lemma  tells  us  that  the  local  extension  of  a  causal  mapping  like  H2 
always  exists: 

Lemma  5.3  (Existence  of  local  extension)  Let  m],m2  €  Af  be  model  fragments 
such  that  m2  is  a  causal  approximation  of  mi.  Let  H2  :  m2  P{m2)  be  a  causal 
mapping.  Then  there  exists  a  causal  mapping  Hi  :  m^  P{mi)  such  that  Hi  is  a 
local  extension  of  H2. 

Proof:  Let  G  :  m2  — »  mi  be  a  correspondence  mapping,  and  let  L  be  a  local  causal 
mapping  with  respect  to  G.  G  and  L  must  exist  because  m2  is  a  causal  approximation 
of  mi.  We  define  Hi  as  follows.  For  each  e  €  mi,  if  there  exists  e'  €  m2  such  that 
G(e  )  =  e,  then  let  Hi(e)  =  H2{e').  This  is  possible  because  G  is  a  correspondence 
mapping  and  hence  Pde')  C  (e).  Otherwise,  let  Hi{e)  =  L{e). 

Hi  is  well  defined  because  the  range  of  L  contains  only  parameters  that  are  not 
local  to  m2,  and  hence  not  in  the  range  of  H2.  Hi  is  a  causal  mapping  because  both 
Hi  and  L  are  causal  mappings.  Finally,  Hi  is  a  local  extension  of  H2  because  Hi  and 
H2  agree  on  corresponding  equations,  and  on  the  remaining  equations  Hi  agri;es  with 
Zr,  and  hence  maps  these  equations  to  parameters  local  to  mi  but  not  local  to  m2.  □ 

Global  extensions 

Global  extensions  are  similar  to  local  extensions,  except  that  instead  of  considering 
causal  mappings  of  the  equations  of  model  fragments,  we  consider  causal  mappings 
of  the  equations  of  models. 

Definition  5.5  (Global  extension)  Let  I  be  an  instance  of  the  MINIMAL  CAUSAL 
MODEL  problem  such  that  all  the  approximation  relations  are  causal  approximations. 
Let  Ml ,  .M2  C  Ai  be  models,  such  that  Mi  and  M2  are  not  overconstrained,  Mi  &nd 
M2  have  model  fragments  from  the  same  assumption  classes,  and  M2  <  Mi .  Let 
Hi  :  E{Mi)  — >  P{Mi)  and  H2  :  E{M2)  — +  P{M2)  be  causal  mappings.  Hi  is  said  to 
be  a  global  extension  of  H2  if  for  each  m2  €  M2: 
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1.  If  m2  €  Ml,  then  for  each  e  €  m2,  H2{e)  =  Hi{e). 

2.  If  m2  ^  Ml,  then  let  mi  €  Mi  be  such  that  approximation{mi,m2).  Then,  Hi 
restricted  to  mi  is  a  local  extension  of  11  restricted  to  m2. 

Hence,  as  with  local  extensions,  Hi  is  a  global  extension  of  H2  if  both  Hi  eind  H2 
causally  orient  corresponding  equations  in  the  same  way.  It  is  also  easy  to  check  that 
the  range  of  Hi  contains  only  parameters  that  are  in  the  range  of  H2  or  that  are  local 
to  the  model  fragments  in  Mi.  Finally,  global  extensions  are  guaranteed  to  exist: 

Lemma  5.4  (Existence  of  global  extension)  Let  I  be  an  instance  of  the  MINI¬ 
MAL  CAUSAL  MODEL  problem  such  that  all  the  approximation  relations  are  causal 
approximations,  and  the  contradictory  relation  partitions  the  set  M.  of  model  frag¬ 
ments  into  the  set  A  of  assumption  classes.  Let  Mi,  M2  Q  M.  be  models  such  that 
Ml  and  M2  are  not  overconstrained,  Mi  and  M2  have  model  fragments  from  the  same 
assumption  classes,  and  M2  <  Mi.  Let  H2  '  E{M2)  P{M2)  be  a  causal  mapping. 

Then  there  exists  a  causal  mapping  Hi  :  E{Mi)  -*  P{Mi)  such  that  Hi  is  a  global 
extension  of  H2. 

Proof:  For  the  equations  of  each  model  fragment  mE  Mi,  define  Hi  as  follows: 

1.  If  m  €  M2,  then  define  Hi  to  be  identical  to  H2  for  each  equation  in  E{m)', 

2.  Otherwise,  there  exists  a  unique  m'  €  M2  such  that  m!  is  a  causal  approximation 
of  m.  Define  Hi  on  m  to  be  the  local  extension  of  H2  on  m! .  This  is  always 
possible  because  Lemma  5.3  tells  us  that  such  a  local  extension  must  exist.  In 
addition,  parameters  in  the  range  of  Hi  restricted  to  m  are  either  in  the  range 
of  H2  restricted  to  m',  or  are  local  to  m.  Hence,  this  extension  does  not  overlap 
with  the  definition  of  Hi  on  other  model  fragments. 

It  is  easy  to  verify  that  Hi  is,  in  fact,  a  global  extension  of  if 2- 

The  importance  of  global  extensions  stems  from  the  fact  that  if  Hi  is  a  global 
extension  of  H2,  then  the  direct  causal  dependencies  entailed  by  H2  are  a  subset  of 
the  direct  causal  dependencies  entailed  by  Hi,  i.e.,  Ch^  Q  Chi’ 
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Lemma  5.5  Let  I  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  such  that 
all  the  approximation  relations  are  causal  approximations.  Let  Mi,  M2  C  M  be  mod¬ 
els  such  that  Ml  and  M2  are  not  overconstrained,  Mi  and  M2  have  model  fragments 
from  the  same  assumption  classes,  and  M2  <  Mi.  Let  Hi  :  E{Mi)  — >  P{Mi)  and 
H2  :  E{M2)  — >  P{M2)  be  causal  mappings  such  that  Hi  is  a  global  extension  of  H2. 
Then  Ch^  ^  Chi  • 

Proof:  Let  {p\,p2)  €  Ch^.  From  Equation  3.5,  this  means  that  there  is  an  equation 
e  €  E{M2)  such  that  H2{e)  =  p2  and  pi  €  P(c).  Let  m  €  M2  be  the  model  fragment 
such  that  e  €  m.  Now  there  are  two  cases: 

1.  If  m  €  Ml,  then  since  Hi  is  a  global  extension  of  H2,  Hi{e)  =  H2{e),  and  hence 
{Pl^P2)  €  Chi- 

2.  Otherwise,  there  is  a  model  fragment  m'  €  Mi  such  that  m  is  a  causal  approxi¬ 
mation  cf  m'.  Let  G  :  m  — ♦  m'  be  the  correspondence  mapping.  Let  e'  =  G{e). 
Since  H\  is  a  global  extension  of  H2,  it  follows  that  Hi  restricted  to  m'  is  a 
local  extension  of  H2  restricted  to  m.  Hence,  Hi{e')  =  H2{e)  =  p2.  Since  G  is  a 
correspondence  mapping,  it  follows  that  P{e)  C  P(e').  Hence,  since  pi  €  P{e), 
it  follows  that  pi  €  P(e').  Hence,  it  follows  that  {pi,p2)  €  G//,. 

Hence,  in  either  case,  if  (pi,p2)  €  Ch^,  then  {pi,p2)  €  G//, .  Hence,  it  follows  that 
Ch2  Q  Chi-  □ 

Monotonicity  of  causal  relations 

The  main  theorem  of  this  section  is  an  immediate  consequence  of  the  above  lemma: 
if  we  simplify  a  model  without  dropping  any  assumption  classes,  then  the  entailed 
causal  relations  decrease  monotonically. 

Theorem  5.2  Let  I  be  an  instance  of  MINIMAL  CAUSAL  MODEL  such  that  all  the 
approximation  relations  are  causal  approximations,  and  the  contradictory  relation 
partitions  the  set  A4  of  model  fragments  into  the  set  A.  of  assumption  classes.  Let 
Mi,M2  C  M  be  complete  models  such  that  Mi  and  M2  contain  model  fragments  from 
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the  same  assumption  classes,  and  M2  <  Mi.  The  causal  relations  entailed  by  the 
equations  of  M2  are  a  subset  of  the  causal  relations  entailed  by  the  equations  of  Mi, 
i.e.,  C{E{M2))  C  C{E{Mi)). 

Proof:  Let  H2  :  E{M2)  P(M2)  be  an  onto  caused  mapping.  Such  an  onto  causal 

mapping  must  exist  because  M2  is  complete.  Let  Hi  :  E{Mi)  —*■  P{Mi)  be  the  global 
extension  of  H2.  Since  Mi  is  complete,  and  hence  not  overconstrained,  Lemma  5.4 
tells  us  that  such  a  global  extension  must  exist.  Since  Hi  is  ^fined  on  each  equation 
in  E{M\),  and  Mi  is  complete,  it  follows  that  Hi  is  an  onto  causal  mapping. 

Lemma  5.5  tells  us  that  Ch^  C  Ch,-  Hence,  ^  and  hence 

C{E{M2))  C  C{E( Ml)),  i.e.,  the  causal  relations  entailed  by  the  equations  of  M2  are 
a  subset  of  the  causal  relations  entailed  by  the  equations  of  Mi .  □ 

Hence,  if  a  coherent  model  does  not  explain  the  expected  behavior,  it  follows  that 
no  simpler  coherent  model  that  uses  the  same  set  of  assumption  classes  can  explain 
the  expected  behavior.  Hence,  when  all  the  approximations  are  causal  approxima¬ 
tions,  a  restricted  version  of  the  upward  failure  property  is  safslled.  Note  that,  unlike 
the  upward  f  dlure  property,  it  is  easy  to  decide  whether  or  all  the  approximations 
are  causal  approximations.  In  particular,  one  can  easily  check  whether  a  particulax 
approximation  is  a  causal  approximation,  and  the  transitivity  of  the  caused  approxi¬ 
mation  relation  tells  us  that  we  need  only  check  that  all  the  immediate  approximations 
of  a  model  fragment  are  causal  approximations. 

5.3.3  Causal  approximations  are  common 

Causal  approximations  are  particularly  useful,  because  they  are  commonly  found 
in  modeling  the  physical  world.  The  following  is  a  partied  list  of  commonly  used 
approximations  that  axe  causal  approximations. 

1.  Disregarding  the  translational  and  rotational  inertia  of  an  object 

2.  Disregarding  relativistic  effects 

3.  Rigid  bodies 
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4.  Frictionless  motion 

5.  Zero  or  uniform  gravitational  fields 

6.  Elastic  collisions 

7.  IdeaJ  gas  law 

3.  Ideal  thermal  conductors  and  insulators 

9.  Constant  thermal  conductance 

10.  Ideal  electrical  conductors  and  insulators 

11.  Constant  electrical  resistance  and  resistivity 

12.  Ideal  heat  engine 

13.  In  viscid  flow 

14.  Disregarding  thermal  expansion 

The  details  of  the  above  causal  approximations,  including  the  actual  equations 
used,  can  be  found  in  Appendix  A.  The  ubiquity  of  caused  approximations  suggests 
that  we  have  identified  an  important  property  of  commonly  occurring  instances  of 
the  MINIMAL  CAUSAL  MODF'"  problem. 

In  summary,  we  have  shown  that  when  all  the  approximation  relations  are  causal 
approximations,  if  we  restrict  ourselves  to  models  that  select  a  model  fragment  from 
each  assumption  class  in  a  fixed  set  of  assumption  classes,  then  the  upward  failure 
property  is  satisfied.  Hence,  the  basic  restriction  that  we  place  on  the  instance  I  of 
the  MINIMAL  CAUSAL  MODEL  problem  is: 

•  All  the  approximation  relations  between  model  fragments  are  causal  approxi¬ 
mations. 
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5.4  Selecting  assumption  classes 

In  the  previous  section,  we  investigated  the  case  where  all  models  were  required  to 
have  a  model  fragment  from  each  assumption  class  in  some  fixed  set  of  assumption 
classes.  We  showed  that  when  all  the  a.p~  imation  relations  were  causal  approxima¬ 
tions,  then  the  causal  relations  entailed  ^  simpler  model  were  a  subset  of  the  causal 
relations  entailed  by  a  more  complex  model.  In  this  section  we  extend  this  result  to 
all  models,  i.e.,  where  models  can  also  be  simplified  by  dropping  model  fragments. 

5.4.1  Causal  approximations  are  not  enough 

A  simple  example  illustrates  that  causal  approximations  alone  are  not  sufficient.  Let 
Ai  =  {mii,mi2}  and  A2  =  {m2}  be  assumption  classes,  and  let  the  equations  of 
model  fragments  mu,  mi25  and  m2  be  defined  as  follows: 

mil  =  {«=y,y  =  z} 
mi2  =  {x  =y,  €xogenous{y)} 
m2  =  {exogenous{x)} 

Furthermore,  let  mi2  be  an  approximation  of  mu: 

approximation{mii,  ^12) 

It  is  easy  to  verify  that  m^  is  a  causal  approximation  of  mu.  Let  Mi  =  {mii,m2} 
and  M2  =  {mi2}  be  two  models.  Assuming  that  there  are  no  propositional  coherence 
constraints,  it  is  easy  to  verify  that  both  Mi  and  M2  are  coherent  models,  and  that 
M2  <  M]. 


(a)  Causal  ordering  from  Mi  (b)  Causal  ordering  from  M2 

Figure  5.4:  Causal  orderings  generated  from  Mi  and  M2 


Figure  5.4  shows  the  causal  orderings  generated  from  these  two  models.  In  par¬ 
ticular,  y  causally  depends  on  x  in  the  causal  ordering  generated  from  Mi,  while  x 
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causally  depends  on  y  in  the  causal  ordering  generated  from  M2.  Hence,  in  simpli¬ 
fying  Ml  to  M2,  the  causal  relations  have  not  decreased  monotonically.  This  means 
that  the  upward  failure  property  is  not  satisfied. 

On  the  other  hand,  if  we  replace  m2  with  m^,  where: 

mj  =  {exogenous(z)} 

then  the  causal  ordering  generated  using  M{  =  {mii,m2}  is  shown  in  Figure  5.5.  In 
this  case,  it  is  easy  to  verify  that  the  causal  relations  have  decreased  monotonically 
in  going  from  M[  to  M2. 


z  - ►  y  - ►  X 

Figure  5.5:  Causal  ordering  generated  from  M{ 

Intuitively,  the  difference  between  the  two  cases  can  be  summarized  as  follows. 
In  the  first  case,  M2  did  not  include  all  phenomena  that  were  possibly  “relevant” 
to  its  parameters.  In  particular,  M2  used  the  parameter  x,  but  did  not  include  m2, 
even  though  an  equation  in  m2  could  causally  determine  x.  On  the  other  hand,  in  the 
second  case,  M2  included  models  of  all  phenomena  relevant  to  x  and  y — the  equations 
of  can  only  determine  z. 

We  can  use  the  above  intuition  to  ensure  that  the  causal  relations  entailed  by 
coherent  models  decrease  monotonically  as  models  become  simpler,  even  when  as¬ 
sumption  classes  can  be  dropped.  We  formalize  this  intuition  by  first  introducing  the 
ownership  of  parameters  by  assumption  classes,  and  then  introducing  a  set  of  owner¬ 
ship  constraints  that  will  ensure  that  coherent  models  include  all  possibly  “relevant” 
phenomena. 

5.4.2  Parameter  ownership 

The  parameters  owned  by  an  assumption  class  are  the  parameters  that  Ccin  be  causally 
determined  by  some  equation  of  some  model  fragment  in  the  assumption  class. 
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Definition  5.6  (Parameter  ownership)  The  parameters  owned  by  an  assumption 
class  A,  denoted  by  owns{A),  are  the  parameters  that  can  be  causally  determined  by 
the  equations  of  model  fragments  of  A: 

owns{A)  =  (J  Pc{m) 

m£A 

One  can  view  an  assumption  class  as  being  possibly  “relevant”  to  the  parameters 
that  it  owns.  For  example,  the  Electrical-conductor(wire-l)  assumption  class, 
shown  in  Figure  5.6,  owns  and  tu,,  but  not  R^.  Hence,  model  fragments  from  this 
assumption  class  are  possibly  “relevant”  to  causally  determining  and  i^,  but  not 

Rw- 

Ideal-conductor(wire-l)  :  =  0 

Ideal-insulator(wire-l)  :  =  0 

Resistor(wire-l) :  Vy^-i^R^ 

Figure  5.6:  Model  fragments  in  the  Electrical-conductor(wire-l)  assumption 
class 


5.4.3  Ownership  constraints 

We  can  ensure  that  coherent  models  will  contain  model  fragments  from  all  possibly 
“relevant”  assumption  classes,  by  adding  constraints  of  the  form 

m 

to  the  set  C  of  propositional  coherence  constraints,  whenever  assumption  class  A 
owns  a  parameter  that  can  be  causally  determined  by  an  equation  in  m,  i.e.,  when 
Pc{m)  n  owns{A)  is  not  empty.  This  will  ensure  that  whenever  a  coherent  model 
contains  model  fragment  m,  it  will  also  contain  a  model  fragment  from  A.  We  call 
the  above  set  of  constraints  ownership  constraints: 

Definition  5.7  (Ownership  constraints)  Let  I  be  an  instance  of  the  MINIMAL 
CAUSAL  MODEI  problem.  The  set  O  of  ownership  constraints  of  X  are  defined  as 
follows: 


O  —  {m  A  :  m  E  Ai  A  A  E  A  A  Pc{m)  D  owns{A)  ^  0} 
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5.4.4  Monotonicity  of  causal  relations 

When  C  contains  all  the  ownership  constraints,  so  that  coherent  models  contain  model 
fragments  from  all  possibly  “relevant”  assumption  classes,  we  can  extend  Theorem  5.2 
to  all  coherent  models.  We  start  with  a  lemma  that  states  that  if  Ei  and  E2  are 
complete  sets  of  equations  such  that  CiE^)  C  C{Ei),  then  removing  the  same  set  of 
exogenous  equations  from  both  Ei  and  E2  preserves  this  property. 

Lemma  5.6  Let  E\  and  E2  be  complete  sets  of  equations.  Let  Q  =  be 

a  set  of  parameters,  and  let  D  =  (Jq^Qexogenous{q)  be  a  set  of  equations  such  that 
D  Q  El  and  D  Q  E2.  Let  H[  :  Ei  — »  P(Ei)  and  :  E2  — >  P{E2)  be  onto  causal 
mappings.  Let  Hi  be  H[  restricted  to  the  equations  in  Ei  \  D,  and  let  H2  be 
restricted  to  the  equations  in  E2  \  D.  If  tc{C}i>^)  C  tc{CH{),  then  C  tc(C^,). 

Proof:  We  first  show  that  tc(C^')  =  tc(C;/j).  For  any  equation  exogenous{qi)  in 
D,  H[{exogenous{qi))  =  qi.  Hence,  there  is  no  parameter  p  such  that  (p, 9,)  €  Ch[- 
Hence,  there  is  no  parameter  p  such  that  (p,?,)  €  fc(C//')-  Hence,  if  (p,g)  € 
it  follows  that  there  is  no  parameter  q,  €  Q  such  that  (p,?,)  €  tc{CH[)  and  (9,, 9)  € 
tc{CH[)-  Hence,  any  causal  path  from  p  to  9  using  H[  does  not  involve  any  equation 
in  D.  Hence,  this  causal  path  must  exist  in  the  causal  o/dering  using  Hi.  Hence,  it 
follows  that  tc{CH[)  =  Similarly,  =  tc^Cuq).  Hence,  if  fc(C'H')  C 

tc{CH[)i  it  follows  that  tc{CHi)  ^  tc(C’Hi).  ° 

Using  the  above  lemma  we  can  generalize  Theorem  5.2  to  all  coherent  models. 

Theorem  5.3  Let  I  be  an  instance  of  MINIMAL  CAUSAL  MODEL  such  that  all  the 
approximation  relations  are  causal  approximations,  and  the  contradictory  relation 
partitions  the  set  M  of  model  fragments  into  the  set  A  of  assumption  classes.  LetC 
contain  all  the  ownership  constraints  of  I.  Let  Mi,  M2  C  M  be  coherent  models  such 
that  M2  <  All.  The  causal  relations  entailed  by  the  equations  of  M2  are  a  subset  of 
the  causal  relations  entailed  by  the  equations  of  Mi,  i.e.,  C{E{M2))  C  C{E{Mi)). 

Proof:  Let  Hi  :  E{Mi)  — ^  P{Mi)  and  H2  :  E{M2)  — ♦  P{M2)  be  onto  causal 
mappings.  Hi  and  H2  must  exist  because  Mi  and  M2  are  coherent  models.  We  will 
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now  use  Hi  and  to  construct  an  onto  causal  mapping  H  :  E{Mi)  — »  P{Mi)  such 
that  Ch2  C  tciCn)- 

Let  us  partition  Mi  into  two  mutually  disjoint  sets  Mu  and  Mu  such  that  Mu  and 
M2  have  model  fragments  from  the  same  assumption  claisses,  and  Mu  and  M2  have 
no  model  fragments  from  the  same  assumption  classes.  Let  Hu  •  E{Mu)  P{Mu) 
be  Hi  restricted  to  the  the  equations  in  E{Mu),  and  let  Hu  '  E{Mu)  P{Mu)  be 
Hi  restricted  to  the  equations  in  E(Mu)- 

Let  Pu  be  the  parameters  in  the  range  of  Hu-  From  the  definition  of  Hu  and 
Hu,  it  is  easy  to  see  that  they  have  mutually  disjoint  ranges.  Hence,  no  parameter 
in  Pu  is  in  the  range  of  Hu-  In  addition,  since  M2  is  a  coherent  model,  it  follows 
that  all  the  ownership  constraints  are  satisfied.  Hence,  since  M2  contains  no  model 
fragments  from  the  assumption  classes  used  in  Mu,  it  follows  that  no  parameter  in 
P{M2)  is  owned  by  an  assumption  class  used  in  Mu-  Hence,  no  parameter  in  the 
range  of  H2  is  in  Pu- 

Let  m  be  a  new  model  fragment  consisting  of  the  following  equations: 

m  =  {exogenous{p) :  p  €  Pu] 

i.e.,  m  makes  each  parameter  in  Pu  exogenous.  One  can  see  that  the  equations  of  m 
are  complete. 

Since  there  is  no  overlap  between  Pu  and  P{M2),  and  since  M2  is  complete,  it 
follows  that  =  Af2  U  {m}  is  a  complete  model.  Let  •  -^(-^2)  ^{M^)  be  an 

onto  causal  mapping,  such  that  restricted  to  E{M2)  is  the  same  as  H2- 

Now  consider  the  set  M[i  --  Mu  U{m}.  We  show  that  M^j  is  complete  by  defining 
a  causal  mapping  H[^  :  E{M[i)  P{M[i).  Let  H[j^  restricted  to  E{Mu)  be  identical 
to  Hu-  For  any  equation  exogenous{p)  in  m,  let  H[i{exogenous{p))  =  p.  This  is 
possible  because  the  range  of  Hu  and  the  parameters  in  Pu  do  not  overlap.  Clearly, 
H[i  is  defined  on  each  equation  in  E{M[^). 

Finally,  is  complete  because  every  parameter  in  P{M[-^)  is  either  in  P{Mu) 
or  in  Pu-  Every  parameter  in  P{Mu)  is  either  in  the  range  of  Hu  or  in  the  range 
of  Hu-  Hence,  every  parameter  in  P{M[i)  is  either  in  the  range  of  Hu  or  in  Pu- 
Hence,  every  parameter  in  P{M[i)  is  in  the  range  of  H^^.  Hence,  H{i  is  an  onto 
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causal  mapping  and  hence  Mjj  is  complete. 

But  Mj'j  and  have  model  fragments  from  the  same  set  of  assumption  classes 
(assuming  that  m  is  in  an  assumption  class  by  itself),  and  both  these  models  are 
complete.  In  addition,  since  M2  <  Afi,  it  follows  tnat  <  M{^.  Hence,  Theorem  5.2 
tells  us  that  C(jE;(M'))  C  C{E{M{^)).  Hence,  tc{CH0  Q  HCh^^)- 

Applying  Lemma  5.6,  with  D  =  m  and  Q  =  P^,  we  have  tciCni)  Q 
But  Hu  is  just  Hi  restricted  to  E(Mu)-  Hence,  it  follows  that  tc{CH2)  ^  tc{CHi)- 
Hence,  C{E{M2))  Q  C{E{Mi)).  □ 

In  summary,  to  ensure  that  the  upward  failure  property  property  is  satisfied,  even 
when  models  can  be  simplified  by  dropping  model  fragments,  we  must  place  the 
following  restriction  on  X: 

•  The  set  C  of  propositional  coherence  constraints  of  J  must  contain  all  the  own¬ 
ership  constraints  of  J,  as  defined  in  Definition  5.7. 

5.4.5  Discussion 

The  above  theorem  tells  us  that,  with  the  addition  of  the  ownership  constraints,  the 
causal  relations  entailed  by  models  decrease  monotonically  as  models  become  simpler. 
But  how  reasonable  are  the  ownership  constraints?  On  the  face  of  it,  they  seem  quite 
restrictive.  However,  under  certain  circumstances,  we  get  the  ownership  constraints 
for  free.  In  particular,  consider  the  situation  in  which  each  equation  can  causally 
determine  exactly  one  parameter.  This  situation  is  found  in  QP  Theory  [Forbus, 
1984]  and  its  derivatives,  e.g.,  [Falkenhainer  and  Forbus,  1991].  When  each  equation 
can  causally  determine  exactly  one  parameter,  one  can  see  that  all  the  parameters 
are  local  to  some  assumption  class.  This  means  that  no  model  fragment  can  causally 
detf’-'inine  a  parameter  owned  by  a  different  assumption  class.  Hence,  under  this 
situation,  there  are  no  ownership  constraints! 

However,  as  we  have  argued  earlier,  the  constraint  that  each  equation  can  causally 
determine  exactly  one  parameter  is  also  restrictive.  In  the  absence  of  this  constraint, 
the  ownership  constraints  appear  to  be  necessary  to  guarantee  that  the  upwaxd  fail¬ 
ure  property  is  satisfied.  However,  in  practice,  we  have  found  that  the  upward  failure 
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property  is  satisfied  even  in  the  absence  of  the  ownership  constraints.  In  other  words, 
the  pathological  cases,  like  the  one  shown  at  the  beginning  of  this  section,  do  not 
appear  to  occur  naturally.  We  conjecture  that  the  reason  for  this  is  that  most  equa¬ 
tions  describing  the  physical  world  do  seem  to  have  a  natural  causal  orientation  (as 
required  in  QP  Theory),  and  the  few  equations  that  do  allow  multiple  causal  orien¬ 
tations  do  not  lead  to  pathological  situations.  Hence,  in  the  modeling  program  that 
we  describe  in  Chapter  8,  no  ownership  constraints  are  included. 


5.5  Individually  approximating  model  fragments 

The  last  two  sections  introduced  two  local  properties  of  X,  causal  approximations 
and  ownership  constraints,  that  ensure  that  the  global  upward  failure  property  is 
satisfied.  In  this  section,  and  in  the  next  section,  we  turn  to  the  other  important 
element  of  the  efficient  model  selection  algorithm:  the  efficient  generation  of  the 
immediate  simplifications  of  coherent  models. 

A  coherent  model  M  can  be  simplified  by  (a)  replacing  one  or  more  model  fr?s- 
ments  in  M  by  their  approximations;  and/or  (b)  dropping  one  or  more  model  frag¬ 
ments  from  M.  In  this  section  we  prove  a  very  important  property  of  the  simplifica¬ 
tions  of  M  which  states  that  the  model  fragments  in  M  can  be  approximated  one  at  a 
time.  More  precisely,  let  m  €  A/  be  any  model  fragment,  and  let  m'  be  any  immediate 
approximation  of  m.  We  show  that  if  the  model  resulting  from  replacing  m  by  m'  is 
not  complete,  then  no  coherent  model  simpler  than  M  can  contain  m'  or  any  of  its 
approximations.  Hence,  if  the  model  fragments  of  M  cannot  be  approximated  one  at 
a  time,  there  is  no  point  approximating  them  two  or  more  at  a  time.  This  property 
will  be  exploited  in  Section  5.7  to  efficiently  simplify  a  causal  model. 

Theorem  5.4  Let  1  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  such 
that  all  the  approximations  are  causal  approximations  andC  contains  all  the  ownership 
constraints.  Let  M  C  Af  be  a  coherent  model.  Let  m  ^  M  be  any  model  fragment,  and 
let  m'  e  M  be  an  immediate  approximation  of  m.  Let  M'  be  the  result  of  replacing 
m  by  m  in  M,  i.e.,  M'  =  (M  \  {m})  U  Let  m"  ^  Ad  be  any  model  fragment 
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such  that  either  m"  =  m'  or  m"  is  one  of  the  approximations  of  m' .  If  E{M')  is  not 
complete,  then  every  model  M"  <  M,  such  that  m"  €  M" ,  is  not  coherent. 

Proof:  The  proof  is  by  contradiction.  Assume  that  there  exists  M"  <  M  such  that 
m"  6  M"  and  M"  is  coherent.  We  will  now  show  that  E{M')  is  complete  by  showing 
that  there  exists  an  onto  causal  mapping  F' :  E(M')  — >  P{M'). 

Let  F  :  E{M)  — >  P{M)  and  F"  :  E{M")  — +  P(M")  be  onto  causal  mappings.  F 
and  F"  must  exist  because  M  and  Af"  are  coherent.  We  use  F  and  F"  to  define  F' 
as  follows. 

Partition  M'  into  two  sets  M{  and  Mj,  such  that  M'  contains  all  the  model 
fragments  in  M'  that  are  in  the  same  assumption  class  as  a  model  fragment  in  M", 
and  contains  all  the  model  fragments  in  M'  that  are  not  in  the  same  assumption 
class  as  any  model  fragment  in  M".  Hence,  m'  €  My,  and  A/j  C  M. 

We  define  F'  by  first  defining  it  on  equations  in  E{M[),  and  then  on  equations  in 
E(M^). 

Let  m'l  be  a  model  fragment  in  M{,  and  let  m'/  be  the  model  fragment  in  M"  that 
is  in  the  same  assumption  class  as  m[.  It  is  easy  to  see  that  m'{  is  either  identical  to 
m'j,  or  is  an  approximation  of  m\.  If  m"  is  identical  to  m\,  then  define  F'  on  each 
equation  in  m'y  to  be  identical  to  F”.  If  m"  is  an  approximation  of  m\,  then  use 
Lemma  5.3  to  define  F'  restricted  to  m\  to  be  the  local  extension  of  F"  restricted  to 
m".  In  either  case,  every  parameter  in  the  range  of  F',  when  restricted  to  m'j,  is  either 
in  the  range  of  F"  restricted  to  m",  or  is  local  to  m\ .  Hence,  F'  maps  equations  in 
different  model  fragments  in  M[  to  different  parameters.  Hence,  the  above  mapping 
maps  each  equation  in  E{M[)  to  a  unique  parameter. 

Define  F'  on  equations  in  E{M!f)  to  be  identical  to  F.  This  is  possible  because 
each  model  fragment  in  is  also  in  M.  In  addition,  since  M"  is  coherent,  it  satisfies 
all  the  ownership  constraints.  Hence,  it  follows  that  Pc(M")  and  PdM^)  have  no 
parameters  in  common.  Hence,  the  range  of  F"  and  the  range  of  F  restricted  to 
are  disjoint.  Hence,  the  above  extension  of  F'  to  equations  in  E{M!f)  is  well  defined. 

The  above  definition  of  F'  maps  each  equation  in  E{M')  to  a  parameter  in  P{M''). 
Hence,  \E{M')\  <  \P{M')\.  However,  in  going  from  M'  to  M,  we  replace  m'  by  m, 
which  introduces  at  least  as  many  new  local  parameters  as  equations.  Since  M  is 
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coherent,  it  follows  that  |£'(M)|  =  |P(Af)|,  and  hence  \E{M')\  =  \P{M')\.  Hence,  F' 
is  an  onto  causal  mapping,  and  hence  M'  is  complete,  which  is  a  contradiction.  □ 


5.6  Expressivity  of  constraints 

Using  the  above  theorem,  it  is  possible  to  show  that  the  immediate  simplifications 
of  any  coherent  model  can  be  generated  in  polynomial  time,  as  long  as  the  only 
constraints  in  C  axe  the  ownership  constraints.  The  complexity  of  generating  the  im¬ 
mediate  simplifications  of  a  coherent  model  in  the  presence  of  additional  constraints 
is  critically  dependent  upon  the  expressive  power  of  these  constraints.  In  particu¬ 
lar,  in  Chapter  4,  we  showed  that  the  MINIMAL  CAUSAL  MODEL  problem  becomes 
intractable  if  C  has  (a)  negative  clauses,  i.e.,  clauses  with  all  negative  literals;  or 
(b)  definite  horn  clauses,  i.e.,  clauses  with  exactly  one  positive  literal.  Hence,  al¬ 
lowing  such  constraints  will  defeat  any  hopes  of  efficiently  generating  the  immediate 
simplifications  of  a  coherent  model. 

Fortunately,  there  is  a  class  of  constraints,  only  slightly  different  from  horn  clauses, 
that  does  not  lead  to  intractability.  In  particular,  we  will  allow  constraints  of  the  form: 

rU]  A  m2  A  ...  A  m,  =>■  A  (S  5) 

where  mi,  m2,  . . . ,  m„  are  model  fra»*m.ats,  ior  so.ae-  n  >{}/  and  A  .s  an  np;!.  i. 
class.  Recall  that  using  an  a'  ‘  ampt'on  clasr  in  a  propositional  coherence  constraint  is 
just  a  shorthand  for  the  disjur-  tion  of  the  model  fragments  in  that  assumption  class. 
Hence,  the  consequents  of  constraints  that  have  the  above  form  eire  restricted  to  be 
the  disjunction  of  all  model  fragments  in  an  assumption  cleiss. 

In  practice,  the  restricted  expressivity  of  the  propositional  coherence  constraints 
has  not  proved  to  be  a  limitation.  This  is  because  our  focus  on  the  task  of  generating 
parsimonious  causal  explanations  has  made  the  expected  behavior  a  central  criterion 
for  defining  model  adequacy.  This  has  decrecised  the  importance  of  the  propositional 
coherence  constraints  in  defining  model  adequacy,  and  hence  the  restricted  expressive 
power  has  not  proved  to  be  problematic. 

^When  n  =  0,  there  are  no  model  fragments  in  the  antecedent  of  the  constraint,  and  the  an¬ 
tecedent  is  assumed  to  be  true. 
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It  is  worth  noting  that  the  ownership  constraints,  introduced  above,  and  the  re¬ 
quires  constraints,  introduced  in  Chapter  2,  are  special  cases  of  the  above  type  of 
constraint  with  n  =  1.  However,  it  is  also  worth  noting  that  this  restriction  limits 
the  set  of  model  fragment  classes  that  can  have  specializations:  only  model  fragment 
classes  that  do  not  contradict  any  model  fragment  class  (so  that  they  are  the  only 
members  of  their  assumption  class)  can  have  specializations. 

In  the  rest  of  this  chapter,  we  assume  that  all  the  constraints  in  C  have  the  above 
for  n.  That  is,  we  place  the  following  restriction  on  the  instance  J  of  the  MINIMAL 
CAUSAL  MODEL  problem: 

•  The  form  of  each  constraint  in  C  is  required  to  be  cis  shown  in  Equation  5.5. 

In  the  next  section  we  show  that  r^tricting  the  expressivity  of  constraints  in  this 
way  does  not  lead  to  intractability. 


5.7  Efficiently  simplifying  a  coherent  model 

In  this  section  we  use  the  restrictions  discussed  thus  far  to  develop  an  efficient  al¬ 
gorithm  for  finding  an  adequate  model.  To  summarize  these  restrictions,  we  aissume 
that.  :ne  instance 


I  =  {M,  contradictory,  approximation,  A, C,p,q) 

of  the  MINIMAL  CAUSAL  MODEL  problem  satisfies  the  following  restrictions: 

Definition  5.8  (Restrictions  on  X)  The  list  of  restrictions  on  Xintroduced  in  this 
chapter  are  as  follows: 

1.  The  contradictory  relation  partitions  the  set  of  model  fragments  in  A4  into  the 
set  A  of  assumption  classes. 

2.  Each  assumption  class  in  A  contains  a  single  most  accurate  model  fragment. 

3.  The  most  accurate  model  ofX  is  coherent. 
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4.  All  the  approximation  relations  are  causal  approximations. 

5.  C  contains  all  the  ownership  constraints  ofX. 

6.  The  form  of  each  constraint  in  C  is  as  shown  in  Equation  5.5. 

The  algorithm  for  efficiently  finding  an  adequate  model  is  a  two  step  procedure.  In 
the  first  step,  the  most  accurate  model  is  simplified  using  the  function  find-minimal- 
causal-model,  shown  in  Figure  5.1,  with  the  simplifications  function  being  restricted 
to  simplifying  a  model  by  approximating  it,  i.e.,  by  replacing  a  model  fragment  by  one 
of  its  immediate  approximations.  In  the  second  step,  the  resulting  model  is  simplified 
by  dropping  all  unnecessary  model  fragments.  We  will  show  that  the  resulting  model 
is,  indeed,  a  minimal  causal  model. 

5.7.1  Simplifying  a  model  by  approximating 

The  first  step  of  simplification  simplifies  the  most  accurate  model  by  replacing  model 
fragments  with  their  immediate  approximations.  Simplifying  a  model  by  replacing  a 
model  fragment  with  esn  immediate  approximation  is  called  simplifying  by  approxi¬ 
mating-. 

Definition  5.9  (Simplifying  by  approximating)  Let  M  Q  M  be  a  coherent 
model,  and  let  m  ^  M  by  any  model  fragment.  Let  m'  be  an  immediate  approximation 
of  m.  Let  M'  =  {M  \  {m})  U  {m'},  i.e.,  M'  is  the  result  of  replacing  m  by  m'  in 
M.  If  M'  is  coherent,  then  M'  is  said  to  be  an  immediate  simplification  of  M  by 
approximating. 

Properties  of  immediate  simplifications  by  approximating 

The  following  four  lemmjis  state  the  important  properties  of  immediate  simplifications 
by  approximating.  The  first  two  lemmas  show  that  the  only  immediate  simplifications 
of  a  coherent  model  that  do  not  drop  any  assumption  classes  are  the  immediate 
simplifications  by  approximating.  The  next  two  lemmas  show  that  the  immediate 
simplifications  by  approximating  can  be  generated  in  polynomial  time. 
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It  is  easy  to  see  that  if  M'  is  an  immediate  simplification  of  M  by  approximating, 
then  M'  is  one  of  M’s  immediate  simplifications: 

Lemma  5.7  Let  J  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that 
satisfies  the  restrictions  in  Definition  5.8,  and  let  M,M'  C  M.  be  coherent  models 
such  that  M'  is  an  immediate  simplification  of  M  by  approximating.  Then  M'  € 
simplifications{M,  J) . 

Proof;  From  the  definition  of  model  simplicity,  it  is  easy  to  see  that  there  is  no  model 
tPat  is  strictly  between  M  and  M'  in  the  simplicity  partial  ordering.  □ 

We  now  show  that  the  only  immediate  simplifications  of  M  that  do  not  drop  any 
assumption  classes  are  the  immediate  simplifications  of  M  by  approximating.  This  is 
a  straightforward  consequence  of  Theorem  5.4. 

Lemma  5.8  Let  T  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that  sat¬ 
isfies  all  the  restrictions  in  Definition  5.8,  and  let  M  C  M  be  a  coherent  model  that 
contains  a  model  fragment  frcm  every  assumption  class  in  A.  Let  M'  be  an  immedi¬ 
ate  simplification  of  M,  i.e.,  M'  €  simplifications{M) ,  and  let  M  and  M'  have  model 
fragments  from  the  same  assumption  class.  Then  M'  is  an  immediate  simplification 
of  M  by  approximating. 

Proof:  The  proof  is  by  contradiction.  Assume  that  M'  is  not  aa  immediate  sim¬ 
plification  by  approximating.  Let  m'  €  M'  be  a  model  fragment  that  is  not  in  M. 
Since  M  and  M'  have  m  »del  fragments  from  the  same  assumption  classes,  ai;d  since 
M'  <  M,  let  m  €  M  he  such  that  m'  is  an  approximation  of  m.  Let  m"  be  a  model 
fragment  such  that  m"  is  an  immediate  approximation  of  m,  and  m'  is  either  identical 
to  m"  or  m'  is  an  approximation  of  m".  Let  M"  =  (M\{m})U{m"}.  Since  M'  <  M, 
it  is  easy  to  sec;  that  M'  <  M".  There  are  now  three  cases: 

1.  If  M'  is  the  same  as  M",  then  M'  is  an  immediate  simplification  of  M  by 
approximating,  which  is  a  contradiction. 

2.  If  M'  <  M",  and  M"  is  coherent,  then  M'  is  not  an  immediate  simplification 
of  M,  which  is  a  contradiction. 
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3.  IfM' <  M",andM"  is  not  coherent,  then  the  equations  of  M"  are  not  complete. 
This  follows  from  the  fact  that  M,  and  hence  Af",  contains  a  model  fragment 
from  each  assumption  class,  and  hence  every  constraint  in  C  must  be  satisfied. 
Hence,  the  only  reason  that  M"  is  not  coherent  is  because  the  equations  of  M" 
are  not  complete.  But  since  M'  contains  a  model  fragment,  m',  that  is  either 
identical  to  m"  or  is  an  approximation  of  m".  Theorem  5.4  tells  us  that  M'  is 
not  coherent,  which  is  a  contradiction. 

Hence,  in  all  three  cases,  we  encounter  a  contradiction.  Hence,  all  the  immediate 
simplifications  of  M,  that  use  a  model  fragment  from  each  assumption  class  in  M, 
are  immediate  simplifications  by  approximating.  □ 


function  simp-by-approximating{M,T) 

/*  Returns  the  immediate  simplifications  of  M  by  approximating  */ 

result  <—  nil 

for  every  m  €  Af  do 

for  every  m'  that  is  an  immediate  approximation  of  m  do 
A/'^(Af\{m})U{m'} 
if  M'  is  coherent  then 

/*  M'  is  an  immediate  simplification  of  M  by  approximating  */ 
Add  M'  to  result 
endif 
endfor 
endfor 
return  result 
end 


Figure  5.7:  The  function  simp-by-approximating. 

Figure  5.7  shows  the  simp-by-approximating  function,  which  returns  the  immedi¬ 
ate  simplifications  of  a  model  by  approximating.  It  is  ecisy  to  see  that  this  function 
returns  all  the  immediate  simplifications  of  M  by  approximating. 

Lemma  5.9  The  simp-by-approximating  function  returns  all  the  immediate  simpli¬ 
fications  of  a  model  by  approximating. 
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Proof:  Immediate  from  the  definition  of  immediate  simplification  by  approximating. 

□ 

In  addition,  one  can  see  that  the  immediate  simplifications  of  M  by  approximating 
can  be  computed  in  polynomial  time. 

Lemma  5.10  Let  I  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that 
satisfies  all  the  restrictions  in  Definition  5.8,  and  let  M  C  M  be  a  coherent  model. 
Then  simp-by-approximating{M,  J)  terminates  in  time  polynomial  in  the  size  of  X. 

Proof:  Lemma  4.1  tells  us  that  checking  whether  or  not  M'  is  coherent  can  be  done  in 
polynomial  time.  The  number  of  times  this  check  heis  to  be  made  is  equal  to  the  num¬ 
ber  of  immediate  approximations  of  the  model  fragments  in  M.  Since  the  model  frag¬ 
ments  in  M  are  in  different  assumption  claisses,  it  follows  that  this  number  is  bounded 
by  the  number  of  model  fragments  in  M.  Hence,  simp-by- approximating {M ,X)  ter¬ 
minates  in  time  polynomiad  in  the  size  of  J.  □ 

Finding  a  simplest  causal  model  by  approximating 

The  above  lem.in  ts,  in  conjunction  with  a  modified  version  of  the  find-minimal- cau¬ 
sal-model  furiction  in  Section  5.1,  allow  us  to  efficiently  find  a  simplest  causal  model 
by  approximating.  A  simplest  causal  model  by  approximating  is  a  causal  model  that 
contains  a  model  fragment  from  each  assumption  class,  such  that  no  causal  model 
that  contains  a  model  fragment  from  e?  :h  assumption  class  is  strictly  simpler  than 
it. 

Definition  5.10  (Simplest  causal  model  by  approximating)  Let  I  be  an  in¬ 
stance  of  the  MINIMAL  CAUSAL  MODEL  problem  and  let  M  C  M.  be  a  causal  model. 
M  is  said  to  be  a  simplest  causal  model  by  approximating  if  and  only  if  (a)  M  contains 
a  model  fragment  from  each  assumption  class  in  A;  and  (b)  if  M'  is  a  causal  model 
ofX  that  contains  a  model  fragment  from  each  assumption  class  in  A,  then  M  <  M' . 

Intuitively,  a  simplest  causal  model  by  approximating  models  each  phenomena  as 
approximately  as  possible.  A  simplest  causal  model  by  approximating  can  be  identi¬ 
fied  using  a  modified  version  of  the  find-minimal-causal-model  function,  in  which  the 
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call  to  the  simplifications  function  is  replaced  by  a  call  to  the  simp-hy- approximating 
function.  The  correctness  of  this  modified  function  follows  from 

1.  Lemma  5.8,  which  tells  us  that  the  only  immediate  simplifications  of  a  model 
that  do  not  drop  any  assumption  classes  are  the  immediate  simplifications  of 
the  model  by  approximating; 

2.  Lemma  5.9,  which  tells  us  that  the  simp-hy- approximating  function  returns  all 
the  immediate  simplifications  of  a  model  by  approximating; 

3.  Theorem  5.3,  which  tells  us  that  X  satisfies  the  upward  fciilure  property;^ 

4.  Lemma  5.1,  which  proves  the  correctness  of  the  find-minimal-causal-model  func¬ 
tion. 

In  addition,  this  modified  function  finds  a  simplest  causal  model  by  approximating 
in  polynomial  time.  This  follows  from: 

1.  Lemma  5.10,  which  tells  us  that  the  immediate  simplifications  of  a  model  by 
approximating  can  be  computed  in  polynomial  time;  and 

2.  Lemma  5.2,  which  tells  us  that  find-minimal-causal-model  terminates  in  poly¬ 
nomial  time  if  the  immediate  simplifications  can  be  computed  in  polynomial 
time. 

Hence,  we  have  the  following  lemma: 

Lemma  5.11  Let  I  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that  sat¬ 
isfies  all  the  restrictions  in  Definition  5.8.  A  simplest  causal  model  by  approximating 
ofX  can  be  found  in  time  polynomial  in  the  size  ofX. 

Proof:  Immediate  from  the  above  discussion.  □ 

^Actually,  Theorem  5.2  is  sufficient,  since  we  are  only  simplifying  by  approximating. 
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5.7.2  Simplifying  a  model  by  dropping  model  fragments 

We  nou'  consider  the  second  step  of  the  model  simplification  procedure,  that  involves 
simplifying  a  model  by  dropping  model  fragments.  We  start  by  showing  the  second 
importcint  consequence  of  Theorem  5.4.  In  particular,  we  show  that  if  Af  is  a  simplest 
causal  model  by  approximating,  then  a  minimal  causal  model  simpler  than  M  is  a 
subset  of  M. 

Lemma  5.12  Let  I  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that 
satisfies  all  the  restrictions  in  Definition  5.8,  and  let  M  C.  M  be  a  simplest  causal 
model  by  approximating.  Let  M'  C  M  be  a  minimal  causal  model  of  I  such  that 
M'<M.  Then  M' CM. 

Proof:  Since  Af  is  a  simplest  causal  model  by  approximating,  it  follows  that  replacing 
any  model  fragment  in  M  by  an  immediate  approximation  results  in  a  model  that  is 
not  complete.  Hence,  Theorem  5.4  tells  us  that  no  coherent  model  that  is  simpler 
than  Af  can  have  a  model  fragment  that  is  an  approximation  of  one  of  the  model 
fragments  in  M.  Hence,  rince  M'  <  Af,  it  follows  that  M'  CM.  □ 

Let  M  be  any  simplest  causal  model  by  approximating.  We  will  now  construct  a 
set  H  of  propositional  horn  constraints  that  must  be  satisfied  by  any  causal  model 
that  is  simpler  than  Af.  H  will  allow  us  to  construct  a  minimal  causal  model  that  is 
simpler  than  Af. 

Definition  5.11  Let  J  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that 
satisfies  all  the  restrictions  in  Definition  5.8,  and  let  causes{p,q)  be  the  expected 
behavior.  Let  M  C  M  be  any  simplest  causal  model  by  approximating,  and  let  F  : 
E{M)  — >  P{M)  be  any  onto  causal  mapping.  Let  Ti  be  a  set  of  propositional  horn 
constraints  defined  as  follows: 

1.  For  each  constraint  of  the  form 


mi  A  m2  A  ...  A  mn  A 
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in  C,  where  each  nxi  £  M,  1  <  t  <  n,  contains  the  horn  constraint 

m\  A  m2  A  ...  A  m„  m 

where  m  =  Af)  M ,  i.e.,m  is  the  model  fragment  that  is  both  in  A  and  M; 

2.  Let  pi,p2  €  P{M)  he  parameters  and  let  ej,e2  €  E{M)  be  equations  such  that 
—  Pi  o.nd  F(e2)  =  P2.  Let  mi, m2  ^  M  be  model  fragments  such  that 
Cl  €  mi  and  €2  €  m2.  If  pi  €  P{e2),  so  that  p2  directly  causally  depends  on  pi, 
then  "H  contains  the  constraint 


m2  mi 


3.  Let  e,  €  E{M)  be  an  equation  such  that  F(e,)  =  q,  and  let  m,  e  M  be  a  model 
fragment  such  that  e,  €  m,.  Then  "H  contains  the  constraint 

nig 

We  now  show  that  any  subset  of  M  is  a  causal  model  if  and  only  if  it  satisfies  all 
the  constraints  in  H. 

Lemma  5.13  Let  M ,  F ,  7i,  p,  q,  eg,  and  mg  be  as  in  Definition  5.11.  Let  M'  C  M 

be  any  model.  Then  M'  is  a  causal  model  if  and  only  if  M'  satisfies  all  the  constraints 
in  H. 

Proof:  (=>)  First,  let  us  assume  that  M'  is  a  causal  model.  We  show  that  M'  satisfies 
all  the  constraints  in  H. 

Since  M'  is  a  causal  model,  it  must  satisfy  all  the  constraints  in  C.  Hence,  since 
^  follows  that  M'  must  satisfy  all  the  constraints  in  Td.  defined  under  point  1 

of  Definition  5.11. 

Since  M'  CM,  \i  follows  that  E{M')  C  E{M).  Hence,  let  F' :  E{M')  P{M') 

be  the  restriction  of  F  to  the  equations  in  E{M').  Since  M'  is  a  causal  model,  it 
follows  that  F'  is  an  onto  causal  mapping. 

Let  pi,  p2,  ei,  62,  mi,  and  m2  be  as  in  point  2  of  Definition  5.11.  If  m2  €  M', 
then  it  follows  that  62  €  E{M').  Hence,  since  pi  €  P{e2),  it  follows  that  pi  €  P{M'). 
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Since  F'  is  an  onto  causal  mapping,  and  since  F'  is  a  restriction  of  F,  it  follows  that 
ei  €  E{M'),  with  F'(ei)  =  pi.  Hence,  it  follows  that  mj  €  M'.  Hence,  the  constraint 
m2  =>•  mi  in  W  is  satisfied.  Hence,  the  constraints  in  H,  defined  under  point  2  of 
Definition  5.11,  are  satisfied. 

Since  M'  is  a  causal  model,  q  €  P{M').  Since  F'  is  an  onto  causal  mapping, 
and  since  F'  is  a  restriction  of  F,  it  follows  that  e,  €  E{M').  Hence,  it  follows  that 
m,  €  M'.  Hence,  the  constraint  in  TL,  defined  under  point  3  of  Definition  5.11,  is 
satisfied. 

Hence,  M'  satisfies  all  the  constraints  in  H. 

(•^)  Let  us  now  assume  that  M'  satisfies  all  the  constraints  in  H.  We  now  show 
that  M'  is  a  causal  model. 

Since  M'  satisfies  all  the  constraints  in  H  defined  under  point  1,  it  follows  that 
M'  satisfies  all  the  constraints  in  C. 

We  now  show  that  the  equations  of  M'  are  complete.  Since  M'  C  M,  it  follows 
that  E{M')  C  E{M).  Hence,  let  the  causal  mapping  F'  :  E{M')  — »  P{M')  be  the 
restriction  of  F  to  the  equations  in  E{M').  We  now  show  that  F'  is  an  onto  causal 
mapping. 

Let  Pi  €  P{M')  be  a  parameter.  We  show  that  there  is  an  equation  ei  €  E{M'), 
such  that  F'{ei)  =  pi.  Let  e2  €  E{M')  be  an  equation  such  that  pi  €  F(e2),  and  let 
m2  €  M'  be  a  model  fragment  such  that  e2  €  m2.  Since  M'  C  M,  it  follows  that 
62  €  E{M),  and  m2  €  M.  Since  F  is  an  onto  mapping,  let  ei  €  E{M)  be  an  equation 
such  that  F(ei)  =  pi,  and  let  mi  €  M  be  a  model  fragment  such  that  ei  €  mi. 
Hence,  from  point  2  in  Definition  5.11,  contains  the  constraint  m2  =»  mi.  Since 
M'  satisfies  all  the  constraints  in  7i,  and  since  m2  €  M',  it  follows  that  mi  €  M' . 
Hence,  ei  €  E{M')  with  F'(ei)  =  pi.  Hence,  M'  is  a  complete  rotJflel. 

It  is  easy  to  see  from  the  above  discussion  that  if  p2  directl}’  causally  depends 
on  Pi  according  to  F,  i.e.,  (pi,p2)  €  Cp,  and  if  p2  €  P{M'),  then  pi  €  P{M')  and 
{PiiP2)  €  Cp’-  We  use  this  observation  to  show  that  M'  is  a  causal  model,  i.e., 
(p,9)€C(F(M')). 

Since  M  is  a  causal  model,  it  follows  that  (p,  q)  €  C{E{M)).  Hence,  there  exists 
a  sequence  of  parameters  po,Pi,...,Pn  €  P{M)  such  that  po  =  p,  p„  =  g,  and 
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(p«)P«+i)  €  Cp,  0  <  i  <  {n  —  1).  Since  M'  satisfies  the  constraint  under  point  3 
of  Definition  5.11,  it  means  that  m,  €  M\  and  hence  q  €  P{M').  Hence,  from 
the  observations  in  the  previous  paragraph,  a  simple  inductive  argument  shows  that 
(Pt)Pi+i)  €  Cf'-,  0  <  i  <  (n  —  1).  Hence,  by  transitivity,  (p,?)  6  C{E{M')). 

Hence,  M'  is  a  causal  model.  Hence,  M'  is  a  causal  model  if  and  only  if  M' 
satisfies  all  the  constraints  in  'H.  □ 

In  conjunction  with  Lemma  5.12,  the  above  lemma  tells  us  that,  if  M  is  a  simplest 
causal  model  by  approximating,  then  a  minimal  causal  model  that  is  simpler  than 
M  is  just  a  smallest  subset  of  M  that  satisfies  all  the  constraints  in  H.  Since  all 
the  constraints  in  Tt  are  horn  clauses,  it  is  easy  to  show  that  there  is  exactly  one 
such  minimal  causal  model.'*  Given  the  set  H.  of  propositional  horn  constraints, 
the  minimal  causal  model  can  be  computed  using  a  procedure  that  is  completely 
analogous  to  boolean  constraint  propagation  (BCP)  [McAllester,  1980].  Figure  5.8 
describes  an  algorithm,  based  on  BCP,  for  finding  a  minimal  causal  model  that  is 
simpler  M. 


function  simplify-by-dropping{M,  I) 

/*  M  is  assumed  to  be  a  simplest  causal  model  by  approximating  */ 
Construct  7{  from  M  and  J  as  described  in  Definition  5.11 
result  *—  nil 

while  there  exists  a  constraint  h  E  Ti  such  that 

the  model  fragments  in  the  antecedent  of  h  are  in  result  and 
the  model  fragment  in  the  consequent  of  h  is  not  in  result  do 
Add  the  model  fragment  in  the  consequent  of  h  to  result 
endwhile 
return  result 
end 


Figure  5.8:  Simplifying  a  model  by  dropping  model  fragments. 


■^This  result  is  just  a  special  case  of  the  more  general  result  from  logic  programming,  that  a 
consistent  set  of  horn  clauses  has  a  unique  minimal  Herbrand  model  [van  Emden  and  Kowalski, 
19761. 
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BCP  is  known  to  run  in  time  linear  in  the  number  of  constraints  [McAllester, 
1980].  Hence,  we  have  the  following: 


Lemma  5.14  Let  I  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that 
satisfies  all  the  restrictions  in  Definition  5.8,  and  let  M  C  M  be  a  simplest  causal 
model  by  approximating.  Then  simplif y-by- dropping (M,T)  returns  a  minimal  causal 
model  of  T  in  time  polynomial  in  the  size  of  X. 

Proof:  It  is  easy  to  see  that  simplif y-by- drop  ping  {M,X)  returns  a  model  that  satisfies 
all  the  constraints  in  H.  Hence,  it  returns  a  causal  model.  It  is  also  easy  to  see  that 
the  returned  model  is  a  minimal  causal  model,  because  a  model  fragment  is  added  to 
result  if  and  only  if  it  has  to  be  added  to  satisfy  some  constraint  in  Ti. 

From  the  definition  of  Ti  (Definition  5.11),  it  is  easy  to  see  that  Ti  can  be  computed 
from  M  and  J  in  time  polynomial  in  the  size  of  J.  The  while  loop  in  simplify-by- 
dropping  is  identical  to  BCP,  and  hence  terminates  in  time  linear  in  the  number 
of  constraints  in  T~i  [McAllester,  1980).  Hence,  simplify~by~dropping[M ,X^  returns  a 
minimal  causal  model  of  I  in  time  polynomial  in  the  size  of  X.  □ 

Hence,  we  can  efficiently  identify  a  minimal  causal  model  by  first  finding  a  simplest 
causal  modely  by  approximating,  as  described  in  Section  5.7.1,  and  then  finding  a 
minimal  causal  model  using  simplify-by-dropping. 

Theorem  5.5  Let  X  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  that 
satisfies  all  the  restrictions  in  Definition  5.8.  A  minimal  causal  model  of  X  can  be 
found  in  time  polynomial  in  the  size  of  X. 

Proof:  Lemma  5.11  tells  us  that  a  simplest  causal  model  by  approximating  can  be 
found  in  polynomial  time.  Lemma  5.14  tells  us  that  this  simplest  causal  model  by 
approximating  can  be  used  to  find  a  minimal  causal  model  in  polynomial  time.  Hence, 
a  minimal  causal  model  of  X  can  be  found  in  time  polynomial  in  the  size  of  X.  □ 
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5.8  Summary 

In  this  chapter  we  identified  a  special  case  of  the  MINIMAL  CAUSAL  MODEL  problem  for 
which  a  minimal  causal  model  could  be  found  eflBciently.  We  started,  in  Section  5.1, 
by  introducing  the  upward  failure  property,  which  states  that  if  a  coherent  model 
is  not  a  causal  model,  then  no  simpler  model  is  a  causal  model.  We  showed  that 
if  (a)  the  upward  failure  property  is  satisfied;  and  (b)  the  immediate  simplifications 
of  a  coherent  model  can  be  generated  in  polynomial  time;  then  a  minimal  causal 
model  can  be  found  in  polynomial  time  using  the  function  find-minimal-causal-model 
shown  in  Figure  5.1.  Unfortunately,  in  general,  it  is  difficult  to  decide  whether  or 
not  the  upward  failure  property  is  satisfied,  and  whether  or  not  coherent  models  have 
a  polynomial  number  of  immediate  simplifications.  Hence,  the  rest  of  the  chapter 
focuses  on  finding  efficient  characterizations  of  these  properties. 

Section  5.3  introduced  a  new  class  of  approximations,  called  causal  approxima¬ 
tions,  that  are  commonly  found  in  modeling  the  physical  world.  When  all  the  approx¬ 
imations  are  causal  approximations,  the  causal  relations  entailed  by  a  model  decrease 
monotonically  as  model  fragments  are  replaced  by  their  approximations.  Hence,  if 
a  model  does  not  explain  the  expected  behavior,  the  simpler  model,  with  a  subset 
of  causal  relations,  also  does  not  explain  the  expected  behavior.  In  addition,  causal 
approximations  are  particularly  useful  because  they  we  commonly  found  in  modeling 
the  physical  world.  Appendix  A  gives  a  list  of  commonly  used  approximations,  all  of 
which  are  causal  approximations. 

Section  5.4  introduced  the  ownership  constraints.  These  constraints  ensure  that 
coherent  models  have  model  fragments  from  all  possibly  relevant  assumption  classes. 
The  ownership  constraints,  in  conjunction  with  causal  approximations,  are  sufficient 
to  ensure  that  the  upward  failure  property  is  satisfied.  Section  5.5  then  showed  that  if 
the  model  fragments  of  a  coherent  model  cannot  be  individually  approximated,  then 
there  is  no  point  approximating  them  two  or  more  at  a  time. 

Section  5.6  introduced  a  syntactic  restriction  on  the  expressive  power  of  the  propo¬ 
sitional  coherence  constraints  in  C.  The  constraints  are  restricted  to  have  the  form 
specified  in  Equation  5.5. 
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Finally,  Section  5.7  put  all  the  restrictions  together,  and  presented  an  efficient 
algorithm  for  finding  a  minimal  causal  model.  Definition  5.8  summarizes  all  the 
restrictions  introduced  in  this  chapter.  The  efficient  algorithm  for  finding  a  minimal 
causal  model  is  a  two  step  procedure.  First,  a  simplest  model  is  found  by  simplifying 
the  most  accurate  model  as  much  as  possible,  without  dropping  any  assumption 
classes.  Second,  a  minimal  causal  model  is  found  by  dropping  all  unnecessary  model 
fragments  from  the  model  identified  in  the  first  step. 


Chapter  6 

Differential  equations 


In  this  chapter  we  generalize  the  results  of  the  previous  chapter  to  include  models 
involving  differential  equations.  The  complication  introduced  by  the  use  of  differential 
equations  is  that  the  causal  ordering  is  not  generated  from  the  set  E  of  equations, 
but  rather  from  the  set  ic[E),  the  integration  completion  of  E.  The  results  of  the 
previous  chapter  do  not  take  into  2w:count  the  additional  int  equations  in  ic{E).  We 
remedy  that  situation  in  this  chapter. 

The  central  result  of  the  previous  chapter  was  the  efficient  model  selection  al¬ 
gorithm  developed  in  Section  5.7.  A  careful  analysis  of  the  proofs  of  the  theorems 
and  lemmas  of  Section  5.7  reveals  that  their  correctness  is  based  on  the  results  of 
Section  5.1,  Theorem  5.3,  and  Theorem  5.4.  Of  these  results,  only  Theorems  5.3 
and  5.4  were  based  on  the  assumption  that  the  models  did  not  involve  differential 
equations.  (The  former  proves  that  the  upward  lailure  property  is  satisfied,  while  the 
latter  proves  that  model  fragments  can  be  individually  approximated.) 

A  further  analysis  of  the  proofs  of  these  theorems  reveals  that  Theorem  5.3  de¬ 
pends  on  Theorem  5.2,  which  proves  the  restricted  version  of  the  upward  failure 
property,  and  Theorem  5.4  depends  on  Lemma  5.3,  which  guarantees  the  existence 
of  a  local  extension.  Furthermore,  the  assumption  that  the  models  do  not  involve 
differential  equations  is  restricted  to  these  two  results.  Hence,  in  this  chapter,  we  will 
generalize  Theorem  5.2  and  Lemma  5.3,  so  that  the  efficient  model  selection  algorithm 
developed  in  Section  5.7  can  be  used  for  models  involving  differential  equations. 
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Section  6.1  introduces  a  canonical  form  for  differential  equations,  and  discusses  its 
consequences.  This  canonical  form  is  similar  to  the  one  commonly  used  in  numerical 
integration.  Section  6.2  discusses  the  different  ways  of  approximating  a  differential 
equation:  exogenizing  and  equilibrating.  These  two  approximation  methods  are  dis¬ 
cussed  in  detail  in  Sections  6.3  and  6.4,  respectively.  The  latter  section  concludes 
with  an  updated  definition  of  a  causal  approximation.  Section  6.5  uses  this  updated 
causal  approximation  definition  to  generalize  Theorem  5.2  to  models  with  differential 
equations. 

Finally,  Section  6.6  generalizes  Lemma  5.3.  It  first  shows  that,  in  general,  a 
coherent  model  can  have  an  exponential  number  of  immediate  simplifications  that 
use  model  fragments  from  the  same  assumption  classes.  It  then  introduces  locally 
self-regulating  parameters,  and  shows  that  when  all  the  parameters  are  locally  self¬ 
regulating,  Lemma  5.3  can  be  generalized  to  model  fragments  with  differential  equa¬ 
tions. 


6.1  Canonical  form 

For  many  purposes,  e.g,,  numerical  integration  [Dahlquist  et  al.,  1974]  and  causal 
ordering  defined  in  Ilwasaki,  1988],  sets  of  differential  equations  are  required  to 
be  in  canonical  form  \  set  of  first-order  differential  equations  is  in  c^lnonical  form, 
if  ':.^iivc.tj  ve  occurs  in  exactly  one  equation.  For  our  purposes,  we  weaken  this 

condition  slightly.  We  shall  say  that  a  set  of  first-order  differential  equations  is  in 
canonical  form  if  each  derivative  can  be  causally  determined  by  exactly  one  equation. 
Hence,  we  allow  derivative  to  occur  in  more  than  one  equation,  though  exactly  one 
equation  can  causally  determine  it.  To  ensure  that  the  equations  of  all  device  models 
are  in  canonical  form,  we  assume  that  the  set  of  model  fragments  under  consideration 
are  in  canonical  form: 

Definition  6.1  (Canonical  form)  A  set  of  model  fragments  is  said  to  be  in  canon¬ 
ical  form  if  and  only  if  the  following  conditions  are  satisfied: 

1.  All  derivatives  are  local  •parameters;  and 
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2.  If  derivative  dp/dt  is  local  to  model  fragment  m,  then  dp/dt  can  be  causally 
determined  by  exactly  one  equation  in  m. 

Condition  1  ensures  that  derivatives  can  be  determined  by  the  equations  of  model 
fragments  in  exactly  one  assumption  class,  while  condition  2  ensures  that  exactly  one 
equation  in  each  such  model  fragment  can  determine  it.  Hence,  the  above  restrictions 
ensure  that  the  equations  of  all  device  models  are  in  canonical  form.  Hence,  we  place 
the  following  restriction  on  J: 

•  The  set  of  model  fragments  in  Ai  are  in  caiioniceJ  form. 

An  important  consequence  of  the  above  restriction  is  as  follows.  Let  dp/dt  be  a 
derivative  that  is  local  to  model  fragment  m,  and  let  c  be  the  equation  of  m  that  can 
causally  determine  dp/dt.  The  integration  completion  of  any  set  of  equations  that 
includes  e  will  introduce  the  equation  int{p, dp/dt).  Since  dp/dt  is  local  to  m,  this  is 
exactly  equivalent  to  augmenting  the  equations  of  m  with  the  equation  int{p,  dp/dt). 
In  fact,  if  we  do  this  for  all  derivatives  in  all  model  fragments,  there  is  no  need 
to  explicitly  apply  the  integration  completion  operator  to  a  set  of  equations — the 
model  fragments  would  already  include  the  equations  introduced  by  the  integration 
completion. 

The  advantage  of  the  above  viewpoint  is  as  follows.  Let  mi  and  m2  be  model 
fragments  such  that  m2  is  a  causal  approximation  of  mj,  as  in  Definition  5.3.  Suppose 
that  every  derivative  that  is  local  to  mi  is  also  local  to  m2.  If  we  were  to  augment  the 
equations  of  mi  and  m2  with  the  int  equations,  as  above,  it  is  easy  to  see  that  the 
same  set  of  equations  are  added  to  both  mi  and  m2.  Hence,  it  is  straightforward  to 
extend  the  correspondence  mapping  between  the  equations  of  mi  and  m2,  to  include 
these  additional  equations:  if  dp/dt  is  a  derivative  local  to  both  mi  and  m2,  the 
correspondence  mapping  is  extended  to  map  the  equation  int{p,  dp/dt)  in  m2  to  the 
equation  int{p,  dp/dt)  in  mi.  It  is  easy  to  see  that  this  extension  satisfies  all  the 
conditions  of  a  correspondence  mapping.  Hence,  if  the  equations  of  mi  and  m2  are 
augmented  as  above,  m2  is  still  a  causal  approximation  of  mi . 

In  summary,  if  the  same  set  of  derivatives  are  local  to  mi  and  m2,  whenever  m2 
is  a  causal  approximation  of  mi,  then  m2  remains  a  causal  approximation  of  mi  even 
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after  we  augment  the  equations  of  all  model  fragments  as  above.  This  means  that 
all  the  results  that  we  have  proved  in  the  previous  chapter  remain  true,  even  if  we 
include  differential  equations! 

However,  it  may  not  always  be  the  case  that  the  same  set  of  derivatives  fje  local 
to  mi  and  m2.  We  now  discuss  this  important  case. 


6.2  Approximating  differential  equations 

Let  mi  and  m2  be  model  fragments  such  that  m2  is  an  approximation  of  mi.  Let 
dp/dt  be  a  derivative  that  is  local  to  mi,  but  not  local  to  m2.  Intuitively,  mi  describes 
a  phenomenon  using  a  dynamic  model,  i.e.,  a  model  involving  differential  equations, 
while  m2  approximates  this  description  by  describing  the  phenomena  using  a  static, 
or  equilibrium,  model.  We  will  now  consider  two  types  of  approximations,  called 
exogenizing  and  equilibrating,  that  convert  a  dynamic  description  of  a  phenomenon 
into  an  equilibrium  description.  These  two  types  of  approximations  were  identified 
by  Iwasaki  in  [Iwasaki,  1988]. 

The  basic  idea  behind  these  approximation  techniques  is  that,  when  viewed  at  the 
right  time-scale,  a  dynamic  system  can  appear  to  be  in  equilibrium.  Iwasaki  considers 
two  cases: 

1.  The  dynamic  behavior  of  a  system  is  slow  compared  to  the  time-scale  of  interest. 
For  example,  car  brakes  wear  out  over  a  period  of  years,  while  the  time-scale  of 
interest  may  be  only  a  matter  of  hours  or  days.  Hence,  at  this  time  scale,  the 
thickness  of  the  brake  pads  can  be  assumed  to  be  constant.  Assuming  that  a 
parameter  does  not  change,  because  its  dynamic  behavior  is  much  slower  than 
the  time-scale  of  interest,  is  called  exogenizing. 

2.  The  dynamic  behavior  of  a  system  is  fast  compared  to  the  time-scale  of  inter¬ 
est.  For  example,  the  dynamic  behavior  of  the  temperature  of  a  small  object 
lasts  only  a  few  minutes,  after  which  it  reaches  thermal  equilibrium  with  its 
environment.  Hence,  at  a  time-scale  of  hours,  the  object’s  temperature  can  be 
assumed  to  “instantaneously”  track  the  environment’s  temperature.  Assuming 
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that  a  parameter  is  always  in  equilibrium,  because  its  dynamic  behavior  is  much 
faster  than  the  time-scale  of  interest,  is  called  equilibrating. 

Let  us  now  consider  the  effect  that  exogenizing  and  equilibrating  have  on  the 
differential  equations  describing  a  dynamic  phenomenon.  Let 

I = m  (6.1) 

be  a  differential  equation,  where  /  is  some  function  of  the  set  P  of  parameters,  such 
that  P  does  not  contain  dpidt.  Exogenizing  the  equation  involves  assuming  that,  at 
the  time  scale  of  interest,  the  value  of  p  does  not  change  significantly.  Hence,  the 
above  equation  is  replaced  by 

exogePous{p) 

Equilibrating  Equation  6.1  involves  assuming  that  p  “quickly”  reaches  equilibrium, 
and  hence  dp/dt  is  always  0.  Hence,  we  replace  Equation  6.1  by 

0  =  f{P) 

More  generally,  we  have  the  following  definitions  of  exogenizing  and  equilibrating: 

Definition  6.2  (Exogenizing  and  equilibrating)  Let  e  be  a  differential  equation 
that  can  causally  determine  the  derivative  dpfdt,  i.e.,  dp/dt  €  Pc{^)- 

•  Exogenizing  e  involves  replacing  it  with  the  equation  exogenous{p) . 

•  Equilibrating  e  involves  replacing  it  with  an  equation  e'  such  that  (a)  dp/dt  ^ 
P{e');  (b)  P{e')  C  P{e);  and  (c)  P,{e')  C  P,(e). 

Note  that,  in  both  exogenizing  and  equilibrating,  the  resulting  equation  does  not 
contain  dp/dt.  Note,  also,  the  slight  generalization  in  our  definition  of  equilibration — 
we  do  not  require  e'  to  be  the  result  of  modifying  e  by  replacing  dp/dt  with  0. 

For  example,  the  equation  describing  the  dynamic  behavior  of  the  temperature  of 
an  object  is: 
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where  T  is  the  object’s  temperature,  C  is  its  heat  capacity,  and  F  is  the  net  heat  flow 
into  the  object.  Exogenizing  this  equation  results  in: 

exogenous{T) 

which  states  that  the  temperature  is  constant.  Equilibrating  that  equation  results  in: 

F  =  0 

which  states  that  the  object’s  temperature  always  changes  to  ensure  that  the  net  heat 
flow  into  the  object  is  0. 

We  now  investigate  the  effect  that  these  types  of  approximations  have  on  the 
results  of  the  previous  chapter. 


6.3  Exogenizing  differential  equations 

Let  mi  and  mj  be  model  fragments  such  that  m2  is  an  approximation  of  mi.  Let 
dp/dt  be  a  derivative  that  is  local  to  mi,  but  not  local  to  m2.  Let  e  €  mi  be  the 
equation  that  can  causally  determine  dpfdt.  Assume  that  e  has  been  exogenized  in 
m2,  i.e.,  e  has  been  replaced  by  €xogenous{p)  in  m2. 

Now,  let  us  assume  that,  if  mi  had  not  contained  e  and  m2  had  not  contained 
exogenous{p),  then  m2  would  have  been  a  causal  approximation  of  mi  according  to 
Definition  5.3.  Let  G  be  the  correspondence  mapping  of  this  hypothetical  causal 
approximation,  and  let  L  be  the  local  causal  mapping  with  respect  to  G. 

We  now  claim  that  m2  (with  exogenous{p))  is  a  causal  approximation  of  mi  (with 
e),  if  we  augment  the  equations  of  mi  to  include  the  equation  int{p,dpjdt).  In 
particular,  the  correspondence  mapping  G  can  be  extended  to  map  exogenous{p)  in 
m2  to  int{p, dpfdt)  in  mi,  and  the  local  causal  mapping  L  can  be  extended  to  map 
the  equation  e  to  the  parameter  dp/dt  which  is  local  to  mi,  but  not  local  to  m2.  But 
we  have  already  seen  that  mi  can  be  viewed  as  containing  the  equation  int(p,  dp/dt). 
Hence,  m2  is  a  causal  approximation  of  mi. 

More  generally,  we  can  modify  Definition  5.3  so  that  the  domain  and  range  of  the 
correspondence  mapping  are  not  the  equations  of  the  model  fragments,  but  the  inte¬ 
gration  completion  of  the  equations  of  the  model  fragments.  With  this  modifications. 
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it  is  easy  to  see  from  the  above  argument  that,  exogenizing  of  differential  equations 
is  a  causal  approximation  Hence,  all  the  results  of  the  previous  chapter  continue  to 
hold,  even  when  we  allow  difierential  equations  to  be  approximated  by  exogenizing. 

In  the  next  section,  where  we  discuss  equilibration,  we  wiU  give  an  updated  defi¬ 
nition  of  a  causal  approximation  that  incorporates  the  above  change. 


6.4  Equilibrating  differential  equations 

Let  mi  and  m2  be  model  fragments  such  that  m2  is  an  approximation  of  mi.  Let 
dp/dt  be  a  derivative  that  is  local  to  mi,  but  not  local  to  m2.  Let  ei  €  mi  be  the 
equatio.n  that  can  causally  determine  dpjdt.  Assume  that  ei  has  been  equilibrated  in 
m2,  and  let  62  be  the  equilibrated  version  of  Ci. 

Let  us  now  assume  that,  if  mi  had  not  contained  ei  and  m2  had  not  contained  62, 
then  m2  would  have  been  a  causal  approximation  of  mi  according  to  Definition  5.3. 
Let  G  be  the  correspondence  mapping  of  this  hypothetical  causal  approximation,  and 
let  L  be  the  local  causal  mapping  with  respect  to  G. 

It  is  now  tempting  to  proceed  as  we  did  in  the  case  of  exogenizing.  In  particular, 
we  can  argue  that,  when  mi  does  contain  Ci  (and  hence  implicitly  int{p,dpfdt))  and 
m2  does  contain  62,  we  can  extend  G  by  mapping  62  to  ei.  SimiWly,  we  can  extend 
L  by  mapping  int{p,  dpjdt)  to  p. 

But  L  can  map  int{p,  dpjdt)  to  p  only  if  p  is  local  to  mi,  but  not  local  to  m^. 
This  means  that  p  cannot  be  determined  by  equilibrium  equations — p  can  only  be 
determined  by  an  int  equation,  or  p  can  be  constant  as  a  result  of  exogenizing  equa¬ 
tions  like  e.  This  is  an  undesirable  state  of  affairs.  For  example,  the  equilibrium 
temperature  of  an  object  cannot  be  determined  by  equilibrating  Equation  6.2. 

Hence,  we  do  not  take  the  above  approach.  Rather,  we  show  that  even  if  we  do 
not  extend  L  to  map  int{p, dpjdt)  to  p,  but  we  do  extend  G  to  map  62  to  Ci,  then 
Theorem  5.2  continues  to  remain  true.  Before  we  do  this,  we  update  the  definition  of 
a  causal  approximation,  incorporating  the  various  changes  that  we  have  discussed. 

Definition  6.3  (Causal  approximation  with  differential  equations)  A  model 
fragment  m2  is  said  to  be  a  causal  approximation  of  a  model  fragment  mi  if  and  only 
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if: 


1.  m2  is  an  approximation  ofm\; 

2.  if  dpfdt  is  a  derivative  that  is  local  to  rrii  but  not  local  to  m2,  then  the  equation 
that  can  causally  determine  dp/dt  in  mi  is  either  exogenized  or  equilibrated  in 
m2; 


3.  There  exists  a  1-1  mapping  G  :  ic(m2)  —*■  ic(mi)  that  satisfies  the  following 
properties: 

(a)  for  each  e  €  tc(m2),  P(e)  C  P(G(e)),  and  Pc(e)  C  P^{G{e)); 

(b)  if  mi  contains  a  differential  equation  e  that  can  causally  determine  deriva¬ 
tive  dp/dt,  ana  if  exogenous{p)  £  m2  is  the  exogenized  version  of  e,  then 
G{exogenous{p))  =  int{p, dp/dt);  and 

(c)  if  mi  contains  a  differential  equation  e  that  can  causally  determine  deriva¬ 
tive  dp/dt,  and  if  e'  €  m2  es  the  equilibrated  version  of  e,  then  G{e')  =  c. 

G  is  called  a  correspondence  mapping,  and  e  and  G{e)  are  called  corresponding 
equations. 

4-  Let  E*  be  the  set  of  equations  in  mi  that  have  no  corresponding  equations  ac¬ 
cording  to  G.  Let  P*  denote  the  set  of  parameters  such  that  q  £  P*  if  and  only 

if 

(a)  q  is  local  to  mi; 

(b)  q  is  not  local  to  m2;  and 

(c)  if  p  is  a  parameter,  dp/dt  is  its  derivative,  and  the  equation  that  can 
causally  determine  dp/dt  in  mi  has  been  equilibrated  in  m2,  then  q  is  nei¬ 
ther  p  nor  dp/dt. 


Then  there  exists  an  onto  causal  mapping  L  :  E*  P*.  L  is  called  a  local 
causal  mapping  with  respect  to  G. 
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It  is  easy  to  verify  that  when  rui  and  m2  have  no  differential  equations,  the  above 
definition  reduces  to  Definition  5.3.  The  changes  are  as  follows.  Condition  2  ensures 
that  the  only  way  differential  equations  can  be  approximated  is  by  either  exogenizing 
them  or  equilibrating  them.  Condition  3b  incorporates  the  change  to  G  discussed  in 
Section  6.3,  while  Condition  3c  incorporates  the  restriction  to  G  discussed  earlier  in 
this  section. 

Condition  4  ensures  that  the  “tempting”  extension  to  L,  discussed  earlier  in  this 
section,  is  disallowed.  In  particular,  note  that  E*  contains  only  equations  in 
Hence,  equations  of  the  form  int{p,dp/dt)  are  not  in  the  domain  of  L. 

Condition  4c  ensures  that  parameter  p  and  derivative  dp/dt,  where  the  equation 
e  that  can  causally  determine  dp/dt  has  been  equilibrated  in  m2,  are  not  in  the  co¬ 
domain  of  L.  This  is  necessary  because  e  is  the  only  equation  in  mj  that  can  causally 
determine  dp/dt,  and  G  already  matches  e  to  its  equilibrated  version.  Hence,  no 
equation  in  the  domain  of  L  can  causally  determine  dp/dt.  Similarly,  if  p  were  local 
to  mi,  but  not  local  to  m2,  then  it  would  be  okay  to  be  “tempted”  as  discussed  earlier, 
and  extend  L  to  map  int{p,  dp/dt)  to  p.  However,  since  int{p,  dp/dt)  is  not  in  the 
domain  of  L,  it  makes  sense  to  leave  p  out  of  the  co-domain  of  L. 


6.5  Monotonicity  of  causal  relations 

In  this  section  we  show  that  Theorem  5.2  remains  true  with  the  above  updated 
definitioii  of  a  causal  approximation.  We  start  by  proving  a  number  of  subsidiary 
lemmas. 

Recall  that  a  causal  mapping  H  :  E  P  is  said  to  be  partial  if  and  only  if  H  is 
not  defined  for  each  equation  in  E.  In  the  next  two  lemmas,  we  define  a  condition 
under  which  a  partial  causal  mapping  can  be  extended  to  a  causal  mapping  defined 
over  more  equations,  such  that  the  resulting  causal  mapping  entails  a  superset  of 
causal  relations.  We  will  first  motivate  these  two  lemmas  with  an  example. 

Let  E  he  a,  set  of  equations: 


E  =  {du/dt  =  v,dv/dt  =  u,dw/dt  =  tu} 


(6.3) 
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and  let  H  ;  ic{E)  —*  P{E)  and  H*  :  ic(E)  —*  P(E)  be  partial  causal  mappings  as 
defined  in  Figure  6.1. 

H(du/dt  =  v)  =  V  H'{dufdt  =  v)  =  du/dt 

H{dv/dt  =  u)  =  u  H'{dvfdt  =  u)  =  dv/dt 

H{dw/dt  =  w)  =  w  H'\dw/dt  =  w)  =  w 

H'lint{u,  du/dt))  =  u 
H'lintlv,  dv/dt))  =  v 

Figure  6.1:  Partial  causal  mappings  H  and  H'. 

Note  that  H'  is  defined  over  more  equations  than  H.  Figure  6.2  shows  the  bipartite 
graphs  representing  E,  and  the  two  causal  mappings  H  and  H'.  The  bold  lines  with 
arrowheads  at  each  end  represent  the  two  causal  mappings. 


dv/dt  =  u 

du/dt  =  V 

dv/dt  =  u 

du/dt  =  V 

\ 

int{u,  du/dt) 

/  d^/dt 

int{u, du/dt) 

^ d^/dt 

int{v, dv/dt) 

dw/dt  =  w 

int{w ,  dw  /  dt) 

'  dv/dt 

int{v,  dv/dt) 

dw/dt  =  w 

int{w, dw/dt) 

'  '  dv/ dt 

dw/dt 

dw/dt 

(a)  Causal  mapping  H 

Figure  6.2: 

(b)  Causd  mapping  H' 

A  motivating  example 

Now  consider  5,  an  alternating  sequence  of  equations  and  parameters,  defined  as 
follows: 

S  =  {int{v,dv/dt),v,  du/dt  =  v, du/dt,  int{u,  du/dt),  u,  dv/dt  =  u,  dv/dt}  (6.4) 

One  can  verify  that  the  partial  causal  mapping  H'  can  be  derived  from  the  partial 
causal  mapping  H  using  5  as  follows: 


•  If  equation  e  is  followed  by  parameter  p  in  the  sequence  S,  then  H'{e)  =  p.  For 
example,  int(y,  dv/dt)  is  followed  by  v,  and  hence  H'{int(v,  dv/dt))  =  v. 
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•  If  equation  e  is  not  in  the  sequence  5,  and  if  H  is  defined  for  e,  then  H'{e)  = 
Hie). 

i.e.,  if  we  view  5  as  a  path  in  Figure  6.2a,  then  Figure  6.2b  is  a  result  of  the  following 
operations:  (a)  if  an  edge  in  5  is  bold  with  arrows,  then  convert  it  into  a  light  edge 
without  arrows;  and  (b)  if  an  edge  in  S  is  light  without  arrows,  then  convert  it  into 
a  bold  edge  with  arrows. 

One  can  also  verify  the  following  two  properties:  (a)  according  to  the  partial 
causal  mapping  H',  the  parameters  appearing  in  the  sequence  S  (i.e.,  v,  duldt,  u,  and 
dvjdt)  are  causally  dependent  upon  each  other;  and  (b)  the  direct  causal  dependencies 
entailed  by  H  are  a  subset  of  all  the  causal  depencJenciss  entailed  by  H',  i.e.,  Ch  £ 
tc{CH').  In  the  next  two  lemmas,  we  give  a  general  characterization  of  such  partial 
causal  mappings  H  and  /f',  and  sequence  5,  and  show  that  the  above  two  properties 
hold.  These  lemmas  will  provide  us  with  a  mechanism  to  extend  a  partial  causal 
mapping  like  H. 

Lemma  6.1  Let  E  be  a  complete  set  of  equations,  and  let  H  :  ic(jF)  P{E)  be  a 
partial  causal  mapping.  Let  S  be  an  alternating  sequence  of  equations  and  parameters: 

s  =  {ei,pi,e2,p2,-..,Cn,Pn},  for  some  n  >  1 


such  that 

1.  No  equation  or  parameter  is  repeated  in  S; 

Pi  €  •Pc(e,),  for  1  <  i  <  n; 

3.  H  is  undefined  for  ei; 

4.  Pn  is  not  in  the  range  of  H ,  i.e.,  there  is  no  e  ic{E)  such  that  H  is  defined 
for  e  and  Hie)  =  pni 

5.  for  each  Ci,  2  <  i  <  n,  if  H  is  defined  for  e,-,  then  =  p,-i.  If  H  is  not 
defined  for  e,,  then  pi-i  €  Pi^i)',  and 


6.  Pn  €  F(ei). 
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Let  H' :  ic{E)  — >  P{E)  he  a  partial  causal  mapping  defined  as  follows: 


Pi 


H'{e)  = 


H{e) 


[  undefined 


if  e  is  Ci,  I  <  i  <  n 

if  e  is  not  e,-,  for  any  1  <  i  <  n,  and  H  is  defined  for  e 
otherwise 


Then,  according  to  H' ,  each  parameter  in  S  is  causally  dependent  on  every  other 
parameter  in  S,  i.e.,  for  every  pi,pj,  1  <  ij  <  n,  {pi,pj)  €  tc{CH'). 


Proof:  Let  pk,  Pk+i,  1  <  k  <n—l,he  any  two  consecutive  parameters  in  the  sequence 
S.  We  first  show  that 


iPk,Pk+i)  €  Ch>,  foT  1  <  k  <n-l  (6.5) 

i.e.,  according  to  H',  pk+i  directly  causally  depends  on  pk.  From  the  definition  of  H', 
we  know  that  H'{ek+i)  =  Pk+\-  Hence,  we  need  only  show  that  pk  €  P(eit+i). 

Condition  5  in  the  statement  of  the  lemma  tells  us  that  if  H  is  defined  for  Cjt+i, 
then  H{ek+i)  =  pk-  Since  H  is  a.  causal  mapping,  it  follows  that  pk  €  P{ek+i).  If  H 
is  not  defined  for  ejt+i,  then  condition  5  tells  us  that  pk  €  P(efc+i).  Hence,  in  either 
case  Pk  €  P(e*+i).  Hence,  {pk,pk+\)  €  Cw- 

Now,  condition  6  in  the  statemait  of  the  lemma  tells  us  that  p„  €  P(ei).  Since 
=  pi,  it  follows  that  (pruPi)  €  Ch'-  Hence,  using  transitivity  tind  the  result 
shown  in  Equation  6.5,  we  can  conclude  that  for  every  Pi,Pj,  1  <  i,i  <  n,  {pi,Pj)  € 
HCh').  □ 

Lemma  6.2  Let  E,  S,  H,  and  H'  he  as  in  Lemma  6.1.  Then  Ch  Q.  <c(C/y»),  i.e., 
the  direct  causal  dependencies  entailed  hy  H  are  a  subset  of  the  transitive  closure  of 
the  direct  causal  dependencies  entailed  hy  H' . 

Proof:  Let  (91,92)  €  Ch-  To  prove  this  lemma,  we  need  only  show  that  (91,92)  € 
tc{CH')- 

Since  (91,92)  €  Ch-,  it  follows  that  there  is  an  equation  e  €  ic{E),  such  that 
■^(®)  =  92  and  9i  €  P{e).  If  e  is  not  an  equation  in  the  sequence  S,  then  by  the 
definition  of  H',  it  follows  that  H'{e)  =  H{e)  =  92.  Therefore,  (91,92)  €  Ch'i  and 
hence  (91,92)  €  tc{CH')- 
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On  the  other  hand,  if  e  is  an  equation  in  the  sequence  S,  then  let  H'{e)  =  p, 
where  p  is  some  parameter  in  the  sequence  S.  Since  H  is  defined  for  e,  and  H{e)  =  92, 
condition  5  of  Lemma  6.1  (and  hence  of  this  lemma)  tells  us  that  92  is  also  a  parameter 
in  the  sequence  S.  Hence,  Lemma  6.1  tells  us  that,  according  to  p  and  92  are 
causally  interdependent.  In  particular,  we  have 

(p,  g2)  etciCw)  (6.6) 

Now,  since  qi  £  P{e),  we  have  two  cases:  fUher  p  is  91,  or  p  is  not  q^.  If  p  is  qi, 
then  Equation  6.6  directly  tells  us  that  (91592)  €  tc(C//').  If  p  is  not  91,  then  since 
9i  €  Pi^),  and  H'{e)  —  p,  it  follows  that  (91, p)  €  Cjj'-  Hence,  in  conjunction  with 
Equation  6.6,  we  have  (91,92)  €  tc{CH')- 

Hence,  in  every  case,  we  have  (91,92)  €  tc{CH')-  Hence,  it  follows  that  Ch  Q 

The  above  two  lemmas  show  that,  under  certain  conditions,  a  partial  causal  map¬ 
ping  if,  and  an  alternating  sequence  of  equations  and  parameters  5,  can  be  used 
to  extend  if  to  a  causal  mapping  if'  that  entails  a  superset  of  causal  dependencies. 
We  now  investigate  conditions  on  if  which  ensure  that  that  a  sequence  like  S  exists, 
so  that  if  can  be  extended  to  H'.  In  particular,  we  introduce  augmentable  causal 
mappings: 

Definition  6.4  (Augmentable  causal  mapping)  Let  E  be  a  complete  set  of  equa¬ 
tions,  and  let  if  :  ic{E)  —*  P{E)  be  a  partial  causal  mapping,  if  is  said  to  be 
augmentable  with  respect  to  E  if  and  only  if 

1.  every  equation  e  £  ic{E),  for  which  if  is  not  defined,  is  of  the  form  int{q,dqfdt), 
for  some  q;  and 

2.  every  parameter  p  £  P{E)  not  in  the  range  of  if,  is  of  the  form  dq/dt,  such 
that  int{q,dqfdt)  £  ic{E)  and  if  is  not  defined  for  int{q,dqldt). 

i.e.,  if  is  not  defined  for  some  int  equations,  and  the  corresponding  derivatives  are 
the  only  parameters  not  in  the  range  of  H. 
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Hence,  an  augmentable  causal  mapping  H  is  undefined  only  for  equations  of  the 
form  int{q,dqfdt),  and  the  corresponding  derivatives  are  the  only  parameters  not  in 
the  range  of  H.  One  can  verify  that  both  the  causal  mappings  H  and  H'  shown 
in  Figure  6.2  are  augmentable  causal  mappings.  We  now  show  that  if  if  is  an  aug¬ 
mentable  causal  mapping,  then  there  exists  a  sequence  S  satisfying  the  conditions  of 
the  above  two  lemmas.  This  will  ensure  that  H  can  be  extended  to  H',  as  discussed 
earlier. 

Lemma  6.3  Let  E  be  a  complete  set  of  equations,  and  let  H  :  ic{E)  — >  PiE)  be 
an  augmentable  causal  mapping  with  respect  to  E.  Then  there  exists  an  alternating 
sequence  of  equations  and  parameters: 

S  P2^  •  •  •  )  Pn}  9  for  SOme  71  ^  1 

that  satisfies  conditions  1-6  in  the  statement  of  Lemma  6.1,  with  respect  to  the  causal 
mapping  H.  In  addition,  let  int(q,dq/dt)  be  any  equation  in  ic{E)  for  which  H  is 
not  defined.  Then  int{q,dqfdt)  occurs  in  the  sequence  S  if  and  only  if  the  parameter 
dqfdt  occurs  in  the  sequence  S. 

Proof:  The  proof  of  this  lemma  is  based  on  an  understanding  of  the  algorithm  for 
finding  maximum  matchings  in  bipartite  graphs  described  in  Appendix  D. 

Let  E\  C  ic{E)  be  the  set  of  equations  for  which  H  is  defined,  and  let  Pi  be  the 
range  of  H.  Let  E2  =  ic{E)  \  Ei  and  let  P2  =  P{E)  \  Pi.  Since  H  is  augmentable, 
it  follows  that  E2  and  P2  are  not  empty,  and  have  the  following  form  (where  m  = 

l^^2|  =  |P2|): 

£2  =  U  {int{qi,dqi/dt)} 

■Pj  =  U  (<*?./*) 

l<«<m 

Let  G  =  (X,  y,  R)  be  the  bipartite  graph  representing  the  equations  in  ic{E)  (see 
Definition  3.5).  Since  E  is  complete,  it  follows  that  any  maximum  matching  in  G  is 
complete.  Let  C/  be  a  matching  in  G  defined  using  H: 


U  =  {(e,  if(e))  :  e  €  ic{E)  A  H{e)  is  defined} 
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U  is  clearly  not  complete  because  H  is  defined  only  on  the  equations  in  and  not 
on  the  equations  in  E2.  In  addition,  since  the  range  of  H  contains  no  parameters  in 
P21  it  follows  that  no  edge  in  U  is  incident  on  any  equation  in  E2  or  any  parameter 
in  P2.  In  particular,  no  edge  in  U  is  incident  on  the  equation  ini{q^,dqildt)  and  on 
the  parameter  dq^fdt. 

For  example,  the  set  U  corresponding  to  the  causal  mapping  H  defined  in  Fig¬ 
ure  6.1  is  the  set 

{{dv/dt  =  u,u),{du/dt  =  v,v),{dw/dt  = 

This  is  shown  graphically  in  Figure  6.2a,  where  U  is  the  set  of  bold  edges  with  arrows 
at  both  ends. 

Define  a  second  bipartite  graph  G'  =  {X,  Y,  R')  that  is  exactly  like  G,  except  that 
G'  contains  the  following  additional  set  of  edges: 

IF  =  U  {ii^K9hdqi/dt),dqi/dt)} 

2<i<m 

i.e.,  the  edges  in  IF  connect  int  equations  for  which  H  is  undefined  to  the  correspond¬ 
ing  derivatives,  but  W  does  not  connect  int{q^,dqi/dt)  to  dqifdt.  Since  G'  has  more 
edges  than  G,  but  is  otherwise  identical  to  G,  it  follows  that  any  matching  of  G  is 
also  a  matching  of  G'.  Hence,  since  G  contains  a  complete  matching,  it  follows  that 
G'  contains  a  complete  matching. 

For  exampk,  Figure  6.3  shows  the  graph  G'  corresponding  to  the  graph  shown  in 
Figure  6.2a,  with  (ja  being  tbe  parameter  v.  The  bold  edges  with  arrows  correspond 
to  the  set  U ,  while  the  bold  edges  withov  tov/s  correspond  to  the  set  IF. 

Let  V  be  the  union  of  fj  and  I'V' .  Since  no  edge  jn  U  is  incident  on  an  equation  in 
E2  or  a  parameter  in  P-j,  it  is  easy  tc>  verify  that  F  is  a  matching  in  G'.  However,  V 
is  not  a  complete  matching.  In  f^ct.  the  only  two  nodes  in  G'  that  V  does  not  match 
are  the  nodes  corresponding  to  eqr:<t  i  n  fii{qi,dqi/dt)  and  parameter  dqifdt.  Since 
G'  contains  a  complete  matching,  5i  fellows  that  there  is  an  augmenting  path  in  (7^ 
with  respect  to  V  (augmenting  paths  are  defined  in  Definition  D.3).  This  is  a  direct 
consequence  of  Lemma  D.2. 
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dv/dt  =  u  ^ - 

- y  u 

dufdt  =  V 

int{u^dufdt)  *  > 

dufdi 

int{v dv / dt)  ^ 

'  dvjdi 

dwfdt  =  w 

— ^  w 

int{iv , dw / dt)  ^ — 

— ^  dwfdt 

Figure  6.3:  Example  of  the  graph  G' 

For  example,  the  following  sequence  is  an  augmenting  path  in  the  graph  shown  in 
Figure  6.3: 

{int{v,dv/dt),v,du/dt  =  v,du/dt,  int{u,dufdt),u,dv/dt  =  u,dvldt}  (6.7) 

This  sequence  is  the  same  as  the  one  shown  in  Equation  6.4. 

In  general,  let 

S  =  (ei,pi,C2,P2,--Men,Pn),for  some  n  >  1 

be  such  an  augmenting  path,  where  each  e;  is  (an  equation)  in  X  and  each  p,-  is  (a 
parameter)  in  Y.  We  clciim  that  S  satisfies  conditions  1-6  in  Lemma  6.1: 

1.  Since  S  is  an  augmenting  path,  no  node  is  repeated  in  S.  Hence,  no  equation 
or  parameter  is  repeated  in  S. 

2.  Since  S  is  an  augmenting  path,  it  follows  that  (e,-,p,),  I  <  i  <  ri,  is 

in  i?',  such  that  (e,-,p,)  is  not  in  V.  Since  W  is  the  se'*  'll  m  R'  tliai  ’nifi 
not  in  R,  and  W  C  V,  it  follows  that  (ej,pj)  i>  eof :  in  R.,  Hence  u^'-ing 
Definition  3.5,  we  conclude  that  pi  €  Pc{(i)-.  for  I  <  t  A  «. 

3.  Since  S  is  an  augmenting  path,  no  edge  in  V  on  Henca,  .H  is 

undefined  for  e-y. 

4.  Since  S  is  an  augmenting  path,  no  edge  in  V  i  j  indderJ:  on  p^.  F  ;t)„  Is  not 
in  the  range  of  H. 
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5.  Since  5  is  an  augmenting  path,  it  follows  that  {ei,p,_i),  2  <  i  <  n,  is  an  edge 
in  V.  Hence,  either  /f(c.)  =  pi_i,  or  {ci,pi_i)  €  W.  If  (e.-,p._i)  €  W,  then  H 
is  not  defined  for  c,-.  Hence,  c,-  is  of  the  form  ini{q,dqfdt)  for  some  q,  and  p,_i 
is  dqjdt.  Hence,  p,_i  €  P{ei). 

Hence,  for  each  c,,  2  <  i  <  n,  if  /f  is  defined  for  e,,  then  H{ei)  =  p,_i.  If  H  is 
not  defined  for  Ci,  then  p,_i  €  P{ei). 

6.  Since  int(qi,dqildt)  and  dqijdt  are  the  only  two  nodes  on  which  an  edge  in 
V  is  not  incident,  it  follows  that  ci  is  int{qy,dqifdt)  and  p„  is  dqijdt.  Hence, 
p„  €  P(ei). 

Hence,  the  augmenting  path  S  is  an  alternating  sequence  of  equations  and  parameters 
that  satisfies  conditions  1-6  in  Lemma  6.1. 

Now  we  show  that  for  any  equation  int{q,dql dt)  for  which  H  is  not  defined,  the 
equation  int{q,  dqjdt)  occurs  in  5  if  and  only  if  dqjdt  occurs  in  S.  This  clearly  holds 
for  9  =  9i,  since  int{qi, dqijdt)  and  dqijdt  are  both  in  S.  Now  suppose  that  the 
equation  int{qi,  dqijdt),  2  <  i  <  m,  occurs  in  5,  and  let  it  be  the  equation  e^,  for 
some  2  <  j  <n.  Since  5  is  an  augmenting  path,  it  follows  that  {ej,pj^i)  €  V.  But 
this  is  only  possible  if  pj_i  is  dqijdt,  since  no  edge  in  U  is  incident  on  int{qi,  dqijdt). 
Hence  dqijdt  occurs  in  5.  A  symmetric  argument  shows  that  if  dqijdt  occurs  in  S 
then  int{qi,  dqijdt)  occurs  in  5.  Hence,  if  H  is  not  defined  for  int{q,  dqjdt),  then 
int{q,  dqjdt)  occurs  in  S  if  and  only  if  dqjdt  occurs  in  S.  □ 

The  above  lemma  shows  that  if  H  is  augmentable,  then  a  sequence  S,  with  the 
right  properties,  exists  such  that  H  can  be  extended.  The  next  lemma  shows  that 
the  resulting  causal  mapping  is  either  complete  or  is  itself  augmentable. 

Lemma  6.4  Let  E  be  a  complete  set  of  equations,  and  let  H  :  ic{E)  — >  P{E)  he  an 
augmentable  causal  mapping.  Then  there  exists  a  causal  mapping  H'  :  ic{E)  — »  P{E) 
such  that 

1.  H'  is  defined  on  more  equations  in  ic{E)  than  H; 

■2-  Ch  Q.  tc{CH');  and 
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3.  either  H'  is  complete,  or  W  is  augmentable  with  respect  to  E. 

H'  is  called  an  augmentation  of  H. 

Proof:  For  any  augmentable  causal  mapping  H,  Lemma  6.3  tells  us  that  there  exists 
an  alternating  sequence  of  equations  and  parameters 

^  —  {^1 )  Pi )  ^2?  P2j  •  •  •  ?  Pnl 

that  satisfies  conditions  1-6  of  Lemma  6.1.  Let  H'  be  defined  from  H  and  S  in  the 
same  way  that  it  was  in  the  statement  of  Lemma  6.1: 

Pi  if  e  is  c;,  1  <  i  <  n 

(®)  =  if  ^  is  not  e,-,  for  any  1  <  i  <  n,  and  H  is  defined  for  e 

undefined  otherwise 

Lemma  6.2  tells  us  that  Ch  Q  tc{CH>)- 

One  can  check  from  the  above  definition  that  H'  is  defined  on  every  equation  on 
which  H  is  defined,  on  every  equation  that  occurs  in  5,  and  no  other.  Similarly, 
one  can  check  that  every  parameter  in  the  range  of  H  is  in  the  range  of  H',  every 
parameter  that  occurs  in  S  is  in  the  range  of  H',  and  no  other  parameters  are  in  the 
range  of  H'. 

In  addition  to  H'  being  defined  on  every  equation  that  H  is  defined,  H'  is  also 
defined  on  ci,  but  H  is  not.  Hence,  H'  is  defined  on  more  equations  in  ic{E)  than  H. 

Since  H'  is  defined  on  every  equation  on  which  H  is  defined,  and  H  is  augmentable, 
it  follows  that  the  only  equations  on  which  H'  may  not  be  defined  are  of  the  form 
int{q,dqldt).  Lemma  6.3  tells  us  int{q,dqfdt)  occurs  in  S  if  and  only  if  dq/dt  occurs 
in  S.  Hence,  H'  is  dfdned  on  int{q,dqldt)  if  and  only  if  dq/dt  is  in  the  range  of  H'. 
Hence,  if  H'  is  not  defined  on  some  equation,  it  follows  that  H'  is  augmentable  with 
respect  to  E.  Otherwise,  H'  is  a  complete  causal  mapping.  □ 

For  example,  augmenting  the  causal  mapping  H  shown  in  Figure  6.2a  using  the  se¬ 
quence  shown  in  Equation  6.7,  results  in  the  causal  mapping  H'  shown  in  Figure  6.2b. 
One  can  easily  verify  that  H'  is  augmentable. 
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We  can  now  recursively  apply  the  above  lemma  to  H',  until  we  are  left  with  an 
onto  causal  mapping.  For  example,  applying  the  above  lemma  to  the  causal  mapping 
H'  of  Figure  6.2b  results  in  the  onto  causal  mapping  H"  shown  in  Figure  6.4. 


dv /dt  =  u 
du/dt  =  V 

int{u,  dufdt) 
int{v,  dvfdt) 


dw/dt  =  w  w 

ini{w,dwfdt)  dw  j di 

Figure  6.4:  The  resulting  onto  causal  mapping  H" 


The  following  lemma  merely  formalizes  the  above  recursive  procedure. 

Lemma  6.5  Let  E  be  a  complete  set  of  equations  and  let  H  :  ic{E)  -+  P{E)  be  an 
augmentable  causal  mapping  with  respect  to  E,  Then  there  exists  an  onto  causal  map¬ 
ping  H'  :  ic{E)  — >  P{E)  such  that  Ch  C  ic(C//<),  i.e.,  the  direct  causal  dependencies 
entailed  by  H  are  a  subset  of  the  transitive  closure  of  the  direct  causal  dependencies 
entailed  by  H' . 


Proof:  To  prove  this  lemma,  we  construct  a  finite  sequence  of  partial  causal  mappings 
Hi :  ic{E)  — +  P{E),  I  <i  <  k,  such  that: 


Hi  =  H 

Hk  =  H' 

H{+i  is  an  augmentation  of  Hi  I  <  i  <  k  —  I 

This  sequence  is  well  defined  because  Lemma  6.4  tells  us  that  for  every  augment- 
able  causal  mapping,  there  exists  an  augmentation  which  is  either  complete  or  aug¬ 
mentable.  Furthermore,  Lemma  6.4  also  tells  us  that  if  is  an  augmentation  of 
ifj,  then  Hi+i  is  defined  on  more  equations  than  Hi.  Hence,  is  undefined  on 
fewer  equations  than  Hi.  Hence,  the  above  sequence  must  be  finite,  as  the  number 


6.5.  MONOTONICITY  OF  CAUSAL  RELATIONS 


173 


of  equations  on  which  the  augmentations  are  undefined  decreases  monotonically  to  0, 
at  which  point  the  augmentation  is  complete. 

Finally,  Lemma  6.4  also  tells  us  that  if  Hi+i  is  an  augmentation  of  H{,  then  C//,  Q 
Hence,  by  transitivity,  it  follows  that  Ch  =  Chi  Q.  tc{Cji,,)  =  <c(C//')- 

We  are  now  in  a  position  to  prove  a  version  of  Theorem  5.2  that  includes  differ¬ 
ential  equations.  Given  complete  models  Afi  and  M2,  such  that  <  Mi,  the  idea 
is  to  use  an  onto  causal  mapping  on  ic{E{M2))  to  construct  an  augmentable  causal 
mapping  on  ic{E{Mi)).  The  above  lemma  is  then  invoked  to  construct  an  onto  causal 
mapping  on  ic{E{Mi)). 

Theorem  6.1  Let  I  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  such 
that  all  the  approximation  relations  are  causal  approximations  (as  in  Definition  6.3), 
and  the  contradictory  relation  partitions  the  set  M  of  model  fragments  into  the  set 
A  of  assumption  classes.  Let  Mi, M2  C  M  be  complete  models  such  that  Mi  and 
M2  contain  model  fragments  from  the  same  assumption  classes,  and  M2  <  Mi .  The 
causal  relations  entailed  by  the  equations  of  M2  are  a  subset  of  the  causal  relations 
entailed  by  the  equations  of  Mi,  i.e.,  C{E{M2))  Q  C{E{Mi)). 

Proof:  Let  H2  :  ic{E{M2))  —*  P{M2)  be  an  onto  causal  mapping.  H2  must  exist 
because  M2  is  complete.  We  now  use  H2  to  construct  an  onto  causal  mapping  Hi  : 
ic{E{Mi))  —*  P{Mi),  such  that  Chj  Q  <c(C//, ). 

Let  same-ac{m,  M)  denote  the  model  fragment  m'  ^  M  such  that  m  and  n*  are 
in  the  same  assumption  class.  Since  Mi  and  M2  have  model  fragments  from  the  same 
assumption  class,  it  follows  that  for  any  mi  €  Mi  and  m2  €  M2,  the  expressions 
same-ac{mi,  M2)  and  same-ac{m2, Mi)  are  well  defined. 

For  any  model  fragment  mi  €  Mi,  let  not-equil{mi)  denote  the  set  of  equations 
in  mi  that  are  not  equilibrated  in  same-ac{mi,  M2),  and  let  not-equil{Mi)  denote  the 
union  of  the  not-equil  equations  in  the  model  fragments  of  Mi.  Similarly,  for  any 
model  fragment  m2  €  M2,  let  not-equil{m2)  denote  the  set  of  equations  in  m2  that 
are  not  equilibrated  versions  of  equations  in  same-ac{m2,  Mi),  and  let  not-equil{M2) 
denote  the  union  of  the  not-equil  equations  in  the  model  fragments  of  M2. 
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Let  mi  €  Mi  be  any  model  fragment,  and  let  m2  €  M2  be  such  that  m2  = 
same-ac(mi,  A/2).  If  mi  and  m2  are  not  identical,  then  it  follows  that  m2  is  a  causal 
approximation  of  mi.  Suppose  that  mi  «ind  m2  are  not  identical,  and  hence  m2  is  a 
causal  approximation  of  mi.  From  the  discussion  at  the  end  of  Sections  6.1  and  6.3, 
it  follows  that,  if  we  let  the  equations  of  mi  be  ic{not-equil{mi )),  and  the  equations 
of  m2  be  ic{not-equil{m2)),  then  m2  is  a  causal  approximation  of  mi  according  to 
Definition  5.3.  Hence,  it  follows  that,  if  we  let  the  equations  of  each  model  fragment 
m  €  M1UM2  be  ic{noi-equil(m)),  then  Lemma  5.4  applies,  i.e.,  any  causal  orientation 
of  the  equations  in  ic{not-equil{M2))  can  be  globally  extended  to  a  causal  orientation 
of  the  equations  in  ic{not-equil(Mi)). 

Using  the  above  observation,  we  define  a  partial  causal  mapping  H  :  ic{E{Mi))  —* 
P{Mi)  as  follows.  Let  H  restricted  to  the  equations  in  ic{not-equil{Mi))  be  the 
global  extension  of  H2  restricted  to  the  equations  in  ic{not-equil{M2)).  Lemma  5.5 
tells  us  that  the  direct  causal  dependencies  entailed  by  H  restricted  to  the  equations 
in  ic{not-equil{Mi))  is  a  superset  of  the  direct  causal  deper  Jencies  entailed  by  H2 
restricted  to  the  equations  in  ic{not-equil{M2)). 

Now,  consider  any  equation  62  that  is  in  ic(jE(A/-))  but  not  in  ic{not-equil{M2)). 
One  can  see  that,  by  definition,  62  is  an  equilibrated  version  of  some  equation  ei  that 
is  in  E{Mi)  but  not  in  ic{not-equU{M\)).  Hence,  we  can  extend  H  to  ei  by  letting 
H{t\)  =  H2{e2).  Since  62  is  am  equilibrated  version  of  Ci,  this  extension  preserves  the 
fact  that  the  direct  causal  dependencies  entailed  by  H  axe  a  superset  of  the  direct 
causal  dependencies  entailed  by  H2-  Since  we  have  extended  the  causal  orientation 
of  each  of  the  equations  in  ic{E{M2)),  it  follows  that  Chs  ^  Ch- 

We  will  now  construct  the  onto  causal  mapping  Hi  from  H.  If  if  is  an  onto 
causal  mapping,  then  we  make  Hi  identical  to  H,  and  hence  Ch,  Q  Chi,  and  hence 
Ch,  Q  tc{CHi)- 

On  the  other  hand,  suppose  that  H  isa.  partial  causal  mapping.  We  now  show  that 
H  is  an  augmentable  causal  mapping  with  respect  to  E{Mi )  (see  Definition  6.4).  First, 
note  that  the  equations  in  ic{E{Mi))  can  be  partitioned  into  three  subsets:  (a)  the 
integration  completion  of  the  equations  that  are  not  equilibrated;  (b)  the  equations 
that  are  equilibrated;  and  (c)  the  int  equations  corresponding  to  the  equations  that 
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are  equilibrated.  It  is  easy  to  verify  that  H  is  undefined  only  on  the  equations  listed 
under  (c),  i.e.,  H  is  undefined  only  on  the  iM  equations  corresponding  to  the  equations 
that  are  equilibrated. 

Second,  if  H  is  undefined  for  in1(q,dq/dt),  then  dq/dt  cannot  be  in  the  range  of 
H.  This  follows  from  the  following  facts.  Let  e  be  the  equation  that  can  causally 
determine  dq/dt  in  E{M\).  From  (c)  above,  we  know  that  e  has  been  equilibrated 
in  E{M2).  Let  the  equilibrated  version  of  e  be  c',  so  that  H{e)  =  H2{e').  Since  e'  is 
the  equilibrated  version  of  e,  it  follows  that  H2{t')  ^  dq/dt,  and  hence  H{e)  7^  dq/dt. 
Since  e  is  the  only  equation  in  E{Mi)  that  can  causally  determine  dq/dt,  it  follows 
that  dq/dt  is  not  in  the  range  of  H.  Hence,  whenever  H  is  undefined  for  int(q,  dq/dt), 
it  follows  that  dq/dt  is  not  in  the  range  of  H.  Finally,  since  E{M\)  is  complete  and 
hence  |tc(F^(Afi))|  =  \P{Mi)\,  it  follows  that  every  other  parameter  in  P{Mi)  is  in 
the  range  of  H. 

Hence,  from  the  above  observations,  and  from  Definition  6.4,  it  follows  that  H  is 
augmentable  with  respect  to  E{Mi).  But  Lemma  6.5  tells  us  that  there  exists  an  onto 
causal  mapping  Hi  ;  ic{E{Mi))  -»  P(Mj)  such  that  Ch  Q.  tc{CHi)-  Since  Ch,  C  Ch^ 
it  follows  that  Cfjj  Q  tc{CH^). 

Hence,  whether  or  not  H  is  onto,  it  follows  that  Ch^  Q  tc{Cn^),  and  hence 
C{E{M2))  C  C{E{Mi)).  □ 

Hence,  we  have  succeeded  in  generalizing  Theorem  5.2  to  models  containing  dif¬ 
ferential  equations.  This  generalization  required  an  updated  definition  of  causal  ap¬ 
proximations.  Hence,  we  require  the  following  restriction  on  I: 

•  All  the  approximation  relations  are  causal  approximations  cis  in  Definition  6.3. 


6.6  Efficiently  equilibrating  differential  equations 

In  the  previous  section,  we  introduced  an  updated  definition  of  a  causal  approxi¬ 
mation,  and  used  this  definition  to  generalize  Theorem  5.2  to  models  that  involve 
differential  equations.  Unfortunately,  even  with  this  updated  definition,  a  coherent 
model  M  can  have  an  exponential  number  of  immediate  simplifications,  all  of  which 
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use  model  fragments  from  the  same  assumption  classes  as  M.  In  Section  6.6.1,  we 
illustrate  this  with  an  example,  and  show  that  the  source  of  the  problem  is  that  we 
have  placed  no  restrictions  on  the  equilibrations  of  differential  equations.  We  then 
address  this  problem  bj^  introducing  locally  self-regulating  parameters,  and  show  that 
when  all  the  parameters  are  locally  self-regulating.  Lemma  5.3  can  be  generalized  to 
model  fragments  involving  differential  equations.  This  allows  us  to  generalize  The¬ 
orem  5.4,  ensuring  that  model  fragments  of  a  coherent  model  can  be  approximated 
one  at  a  time. 


6.6.1  Equilibratiitg  differential  equations  can  be  hard 

Consider  the  two  assumption  classes  Ai  and  /Ij,  and  the  six  model  fragments  mi, 
^11)  tni2,  m2,  m2i,  and  m22,  shown  in  Figure  6.5.  Note  that  all  the  approximation 
relations  are  causal  approximations. 


Ai  =  {mi,mii,mi2} 

A2  =  {m2,m2i,m22} 


=  {dyfdt  =  x} 
mil  =  {exo^enous(aj)} 
mi2  =  {exogenous{x)} 


m2  =  {dxfdt  =  y} 
m2i  =  {exogenous{y)} 
m22  =  {exogenous{y)} 


approximation{m\,  mn)  approximation{m2,  m2i) 

approximation{mi,mi2)  approximation{m2, 17122) 


Figure  6.5:  Assumption  clcisses  and  model  fragments 


Consider  the  model  M  =  {mi,m2}.  Assuming  that  there  are  no  propositional 
coherence  constraints,  it  is  easy  to  see  that  M  is  complete,  and  hence  coherent.  Now 
consider  the  immediate  simplifications  of  M.  Replacing  mi  by  either  of  its  immediate 
approximations  leads  to  an  overconstrained  model.  For  example,  the  model  Mi  = 
{mil, m2}  is  overconstrained  because  ic{E{Mi))  contains  the  equations  int{x , dx f dt) 
from  the  integration  completion  of  m2,  and  exogenous{x)  from  mu.  This  is  clearly 
overconstrained.  Similarly,  replacing  m2  by  either,  of  its  immediate  approximations 
leads  to  an  overconstrained  model. 
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The  only  way  to  get  a  coherent  simplification  is  to  replace  both  m\  and  m2  bv 
one  of  their  immediate  approximations.  For  example,  M2  =  {m]],m2i}  is  a  coherent 
model  that  is  one  of  the  immediate  simplifications  of  M.  Since  both  mi  and  m2  have 
two  immediate  approximations,  it  follows  that  simplifications{M)  has  four  models. 
This  example  can  be  trivially  generalized  to  have  n  assumption  classes,  so  that  the 
number  of  models  in  the  immediate  simplifications  of  the  most  accurate  model  is 
2",  i.e.,  if  we  allow  arbitrary  differential  equations  to  be  equilibrated,  then  coherent 
models  can  have  an  exponential  number  of  immediate  simplifications. 

The  fundamental  problem  underlying  the  above  example  is  that  equations  cannot, 
in  general,  be  equilibrated  individually,  i.e.,  to  ensure  that  the  resulting  model  is 
coherent,  a  set  of  differential  equations  may  have  to  be  equilibrated  simultaneously. 
Hence,  if  each  equation  that  has  to  be  equilibrated  has  multiple  equilibrations,  it 
follows  that  there  will  be  an  exponential  number  of  immediate  simplifications. 

In  addition,  it  is  not  clear  how  we  can  efficiently  identify  a  minimal  set  of  dif¬ 
ferential  equations,  such  that  simultaneously  equilibrating  each  equation  in  that  set 
results  in  a  coherent  model. ^  Hence,  even  if  each  differential  equation  has  just  one 
equilibration,  we  would  still  be  unable  to  generate  the  immediate  simplifications  of  a 
coherent  model  efficiently. 

6.6.2  Self- regulating  parameter 

We  can  circumvent  the  above  problem  if  we  restrict  ourselves  to  only  those  differential 
equations  that  can  be  equilibrated  if  and  only  if  they  can  be  equilibrated  individually. 
Hence,  at  most  one  diflferential  equation  needs  to  be  equilibrated  in  any  immediate 
simplification  of  a  model,  and  the  problem  discussed  above  disappears. 

Let  us  now  understand  how  we  can  enforce  this  restriction.  We  start  by  defining 
a  self-regulating  parameter: 

Definition  6.5  (Self-regulating  parameter)  A  parameter  p  is  said  to  be  self-reg¬ 
ulating  with  respect  to  a  coherent  model  M  if  and  only  if  dvjdt  causally  depends  on 
p  in  the  causal  ordering  of  the  equations  of  M,  i.e.,  {p,dp‘dt)  €  C{E{M)). 

*We  suspect  that  the  problem  of  identifying  such  a  minimal  set  of  differential  ecj.iaticnf  in¬ 
tractable.  However,  we  do  not,  as  yet,  have  a  proof  of  this  hypothesis. 
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Since  dp/dt  determines  p  by  integration,  it  follows  that  a  self-regulating  parameter 
regulates  its  own  variation  over  time.  We  now  prove  an  easy  consequence  of  the 
lemmas  in  Section  6.5.  In  particular,  we  show  that  a  differential  equation  can  be 
equilibrated  only  if  the  corresponding  parameter  is  self-regulating. 

Lemma  6.6  Let  M  be  a  coherent  model,  and  let  e  €  E{M)  be  a  differential  equation 
that  can  causally  determine  derivative  dp/dt.  Let  M'  <  M  be  a  coherent  model  such 
that  e  has  been  equilibrated  in  M' .  Then  p  is  self-regulating  with  respect  to  M . 

Proof:  We  provide  only  a  sketch  of  this  proof.  Recall  from  the  proof  of  Theorem  6.1 
that  an  onto  causal  mapping  F' :  ic{E{M*))  P{M')  is  extended  to  an  onto  causal 

mapping  F  :  ic(E{M))  — >  P(M)  using  the  following  steps:^ 

1.  First,  F  is  defined  as  the  global  extension  of  F'.  One  can  check  that,  at  this 
stage,  dp/dt  is  not  in  the  range  of  F. 

Then  F  is  successively  augmented,  using  the  alternating  sequence  of  equations 
and  parameters  defined  in  Lemma  6.3.  dp/dt  is  introduced  into  the  range  of  F 
when  dp/dt  occurs  in  such  a  sequence. 

However,  Lemma  6.3  tells  us  that  int{p,  dp/dt)  occurs  in  the  sequence  if  and  only 
if  dp/dt  occurs  in  the  sequence.  But  if  int{p,  dp/dt)  occurs  in  the  sequence,  it  is  easy 
to  see  that  p  must  occur  in  the  sequence  (using  condition  2  of  Lemma  6.1).  Hence, 
both  p  and  dp/dt  occur  in  some  sequence  during  the  augmentation.  Hence,  using 
Lemma  6.1,  we  can  infer  that  p  and  dp/dt  are  causally  dependent  upon  each  other. 
Hence,  p  is  self-regulating  with  respect  to  M.  □ 

Intuitively,  the  causal  ordering  from  the  equations  of  M  has  a  causal  path  from  p 
to  dp/dt,  and  back  to  p  via  integration,  like  the  following: 

p  -4  Pi  - >  p„  -V  dp/dt  -L  p 

^This  assumes  that  M  and  M'  have  model  fragments  from  the  same  assumption  classes.  However, 
this  proof  can  be  generalized  straightforwardly,  using  the  prooi  of  Theorem  5.3. 
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Equilibrating  the  equation  that  causally  determines  dp/dt  can  be  viewed  as  removing 
the  last  edge  in  this  path,  and  making  the  rest  of  the  edges  point  in  the  opposite 
direction: 

P*-P\* - *-Pn 

Of  course,  this  is  only  possible  if  the  path  can  be  inverted  in  the  simpler  model.  This 
is  not  always  possible.  For  example,  if  one  of  the  edges  is  an  integration  link: 

p  —i  •  •  •  pi^  dq/ dt  -■*  q  ‘  Pn  dp/ dt  -Ly  p 

then  the  integration  link  from  dq/dt  to  q  cannot  be  inverted.  The  only  way  to  equi¬ 
librate  the  equation  with  this  path  is  to  also  equilibrate  the  equation  that  causally 
determines  dq/dt: 


P* - *-Pk 

<l  *-•••*-  Pn 

The  above  observations  provide  us  with  the  condition  necessary  to  ensure  that  dif¬ 
ferential  equations  can  be  individually  equilibrated.  In  particular,  if  there  is  a  causal 
path  from  p  to  dp/ dt  that  can  be  inverted,  and  that  contains  no  integration  links,  then 
the  equation  that  causally  determines  dp/dt  can  be  individually  equilibrated.  Let  us 
say  that  a  parameter  p  is  statically  self-regulating,  with  respect  to  a  coherent  model 
M,  if  there  exists  an  invertible  causal  path  from  p  to  dp/dt  that  contciins  no  integra¬ 
tion  links.  Hence,  if  in  every  coherent  model,  every  self-regulating  pareuneter  is  also 
statically  self-regulating,  then  differential  equations  can  be  individually  equilibrated. 

Unfortunately,  we  have  no  efficient  method  of  ensuring  that  in  every  coherent 
model,  every  self-regulating  parameter  is  also  statically  self-regulating.  Instead,  we 
restrict  ourselves  to  locally  self-regulating  parameters.  Informally,  p  is  locally  self¬ 
regulating  if  all  the  parameters  in  the  causal  path  from  p  to  dp/dt  axe  local  to  a 
model  fragment.  More  precisely,  we  have  the  following  definition: 

Definition  6.6  (Locally  self-regulating  parameter)  Let  p  be  a  parameter,  and 
let  m  be  any  model  fragment  that  contains  an  equation  e  that  can  causally  determine 
dp/dt,  and  that  has  a  causal  approximation  that  equilibrates  e.  p  is  said  to  be  locally 
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self-regulating  if  and  only  if  every  such  model  fragment  m  has  a  subset  R  C  m  of 
equations,  called  the  self-regulating  subset  ofm  with  respect  to  p,  such  that 

Lee  R; 

2.  \ic{R)\  =  \Pc{ic{R))\; 

3.  if  p'  is  any  other  locally  self-regulating  parameter,  and  if  R'  is  the  self-regulating 
subset  ofm  with  respect  to  p' ,  then  R  and  R'  are  disjoint,  i.e.,  i?  D  i?'  =  0. 

Condition  2  assures  us  that  if  a  coherent  model  contains  m,  then  the  pre-image  of 
every  parameter  in  Pc{ic{R)),  under  any  onto  causal  mapping,  will  be  an  equation  in 
ic{R).  Hence,  every  parameter  in  Pc{R)  behaves  as  if  it  were  local  to  m.  In  conjunc¬ 
tion  with  condition  3,  this  also  means  that  Pc{ic(R))  and  Pc{ic{R'))  are  disjoint. 

For  example.  Figure  6.6  shows  a  model  fragment  that  describes  the  velocity  of 
a  falling  raindrop  [Halliday  and  Resnick,  1978,  page  95].  Figure  6.7  shows  a  causal 
approximation  of  this  model  fragment,  which  equilibrates  the  first  equation,  and  hence 
describes  the  raindrop’s  terminal  velocity. 


mrdvrfdt  =  mrg  —  dr 
dr  =  kVr 

Vr  :  Velocity  of  the  raindrop 
rur  ;  Mass  of  the  raindrop 
g  :  Acceleration  due  to  gravity 
dr  :  Drag 

k  :  Coefficient  of  drag 

Figure  6.6:  Model  fragment  describing  the  velocity  of  a  falling  raindrop 


0  =  mrg  —  dr 
dir  =  kVr 


Figure  6.7:  Model  fragment  describing  the  raindrop’s  terminal  velocity 
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It  is  easy  to  verify  that  Vr  is  locally  self-regulating  with  respect  to  this  set  of  model 
fragments.  In  particular,  we  can  use 

R  —  {nirdVrfdt  =  TTlrg  —  dr,  dr  =  kVr} 

to  verify  the  conditions  in  Definition  6.6. 

Note  that,  in  Definition  6.6,  there  is  no  guarantee  that  the  parameter  p  is  self¬ 
regulating  with  respect  to  every  model  that  contains  m.  However,  we  can  show  that 
if  the  equation  e  can  be  successfully  equilibrated,  then  dpjdt  causally  depends  on  p  in 
any  coherent  model  that  includes  m.  In  such  cases,  the  causal  dependence  of  dpjdt 
on  p  is  mediated  by  parameters  in  Pc(ic(il)).  (c,  m,  and  R  are  as  in  Definition  6.6). 

Instead  of  proving  the  above  claim,  we  show  that  Lemma  5.3  can  be  generalized  to 
model  fragments  involving  differential  equations.  This  will  ensure  that  Theorem  5.4 
will  apply  to  models  involving  differential  equations,  so  that  the  efficient  model  se¬ 
lection  algorithm  developed  in  Section  5.7  can  be  used.  We  start  by  proving  some 
preliminary  properties  of  the  above  definition. 


Lemma  6.7  Let  p,  m,  e,  and  R  be  as  in  Definition  6.6.  Let  m'  be  a  causal  approxi¬ 
mation  of  m,  and  let  G  :  ic{m')  —*  ic{m)  be  a  correspondence  mapping,  and  let  L  be 
the  local  causal  mapping  with  respect  to  G.  Let  R!  C  m'  be  the  pre-image  of  R  under 
G 

R'={e':  G{e')  €  R} 

Then  we  have  the  following: 

1.  i*c(i2')l  =  I^c(*c(i2'))|; 

2.  parameters  in  Pc{ic{R))  are  either  in  Pc{ic{R!)),  or  they  are  local  to  m;  and 

3.  if  ei  is  an  equation  in  the  domain  of  L,  but  e;  ^  ic{R),  then  L{ei)  ^  Pc{ic{R)) 


Proof:  First,  we  show  that  iic(i?')|  =  |Pc(*c(i?'))|.  Let  |ic(i?)|  —  i*c(i?')|  =  fcg  and 
let  fcg,  be  the  number  of  equations  in  R  that  are  equilibrated  in  R'.^  This  means 
that  ic{R)  contains  keq  int  equations  that  are  not  found  in  ic{R').  However,  this  also 

®ln  fact,  condition  3  in  Definition  6.6  allows  us  to  show  that  kgq  =  1. 
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means  that  Pc(ic(R))  contains  keg  derivatives  that  are  not  in  Pc(ic(R'))  (since  they 
have  been  equilibrated). 

Let  ki  =  ke  —  keg.  Hence,  ki  is  the  number  of  equations  in  R  that  have  no 
corresponding  equations  in  R\  Since  m'  is  a  causal  approximation  of  m,  it  follows 
that  there  exists  a  local  causal  mapping  that  maps  these  ki  equations  to  ki  parameters 
that  are  local  to  tn  but  not  local  to  Tn\  Hence,  these  ki  local  parameters  must  be  in 
Pc(R},  but  not  in  Pc(R').  Hen'.’,  |Pc(-R)|  -  |Pc(P')l  >  */• 

Hence,  from  the  above  two  paragraphs,  we  have  the  following: 

|ic(P)|  -  |tc(i?)|  =  keg  +  ki 
\Pc(ic(R))\  -  lPe(iciR))\  >  keg  +  k, 

Since  |ic(i2)|  =  |Pc(ic(i2))|,  it  follows  that 

|ic(P')l  >  in(fc(P0)l 

But  |ic(P')|  cannot  be  greater  than  |Pc(ic(P'))|,  for  otherwise  m'  would  be  overcon¬ 
strained.  Hence,  |ic(P')|  =  |Pc(ic(P'))|. 

We  now  show  that  parameters  in  Pc(ic(R))  are  either  in  Pc(ic(P')),  or  they  are 
local  to  m.  From  the  above  argument,  we  can  see  that  the  parameters  in  Pc(tc(P)) 
can  be  partitioned  into  three  sets:  (a)  ke  derivatives;  (b)  ki  local  parameters;  and  (c) 
parameters  that  are  in  Pc{ic{R!)).  Since  the  derivatives  axe  local  to  m,  it  follows  that 
every  parameter  in  Pc(ic(P))  is  either  in  Pc(ic(P')),  or  local  to  m. 

Finally,  we  show  that  if  e;  is  an  equation  in  the  domain  of  L,  but  e/  ^  ic{R), 
then  L{ei)  ^  P ~{ic{R)).  As  argued  above,  Pc{ic{R))  contains  ki  parameters  that  are 
local  to  m  but  not  local  to  m',  and  that  ic(P)  contains,  ki  equations  that  are  not 
in  the  range  of  G.  Hence,  it  follows  that  L  must  map  these  ki  equations  to  the  ki 
parameters.  Since  L  is  1-1,  it  follows  that  if  e,  i  ic(,R),  L{e,)  i  P,(ic{R)).  a 

The  above  properties  can  be  used  to  guarantee  the  existence  of  loced  extensions, 
i.e.,  we  prove  an  extended  version  of  Lemma  5.3. 

Lemma  6.8  Let  I  be  an  instance  of  the  MINIMAL  CAUSAL  MODEL  problem  in  which 
all  the  approximation  relations  are  causal  approximations,  and  all  the  parameters 
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are  locally  self-regulating.  Let  m,m'  €  M  be  model  fragments  and  let  m'  he  an 
approximation  of  m.  Let  F'  :  ic{m')  — >  P{ic{m'))  be  a  causal  mapping.  Then  there 
exists  a  causal  mapping  F  :  ic{m)  — »  P{ic{m))  such  that  every  parameter  in  the  range 
of  F  is  either  in  the  range  of  F'  or  is  local  to  m. 

Proof:  Let  ei,C2,...,c„  €  m,  for  some  n  >  0,  be  the  diiferential  equations  that 
can  be  equilibrated  by  some  approximation  of  m  (if  n  =  0  then  no  equations  can 
be  equilibrated).  Let  equation  c,-,  1  <  t  <  n,  causally  determine  derivative  dpi/dt. 
Since  all  the  parameters  are  locally  self-regulating,  let  Ri,  1  <  i  <  n,  be  the  self¬ 
regulating  subset  of  m  with  respect  to  pi.  We  know  that  these  self-regulating  subsets 
are  mutually  disjoint,  i.e.,  RiC]  Rj  =  0,  for  I  <i,j  <n  and  i  ^  j. 

Partition  the  equations  in  m  into  (n-hl)  subsets  Rq,  R\,  ... ,  Rn,  where  Rq  contains 
all  the  equations  in  m  that  are  not  in  any  of  the  other  subsets. 

Let  G  :  ic(m')  —*  ic{m)  be  a  correspondence  mapping,  and  let  L  be  the  local  causal 
mapping  with  respect  to  G.  Using  G  and  the  above  partition  of  ic(m),  partition  the 
set  m'  into  the  (n  -|- 1)  subsets  i?j, . . . , as  follows: 

/?;  =  {e:G(e)€/2,},0<t<n 

i.e.,  R'i  contains  the  pre-image  of  Ri  under  G. 

Let  F"  :  ic{m)  — »  P{ic{m))  be  any  causal  mapping.  We  now  use  F'  and  F"  to 
construct  the  desired  causal  mapping  F. 

For  each  equation  in  ic{Ri),  1  <  t  <  n,  let  F  be  identical  to  F".  As  shown  in 
Lemma  6.7,  every  parameter  in  Pc{ic{Ri))  is  either  in  Pc(ic(i?-)),  or  is  local  to  m.  In 
addition.  Lemma  6.7  also  showed  that  |ic(i?-)l  =  lPc(ic(i?())l,  so  that  every  element 
in  Pc{ic{Ri))  is  in  the  range  of  F'  restricted  to  tc(i?’).  Hence,  every  element  in  the 
range  of  F,  when  restricted  to  ic{Ri),  is  either  in  the  range  of  F'  restricted  to  ic{R'f), 
or  is  local  to  m. 

That  only  leaves  equations  in  ic{Ro).  The  equations  in  Ro  have  no  differential 
equations  that  can  be  equilibrated.  Hence,  if  we  restrict  the  equations  of  m  to  ic{Ro) 
and  the  equations  of  m'  to  ic{R[f},  it  follows  that  m'  is  a  causal  approximation  of 
m  according  to  Definition  5.3.  Hence,  the  results  of  the  previous  chapter  apply  to 
ic{Ro)  and  ic{R!^).  In  particular.  Lemma  5.3  tells  us  that  any  causal  mapping  of  the 
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equations  of  ic(R^)  can  be  locally  extended  to  a  causal  mapping  of  the  equations  of 
tc(Ro). 

Define  F  on  ic(Ro)  to  be  the  local  extension  of  F'  on  Hence,  the  param¬ 

eters  in  the  range  of  this  extension  are  either  in  the  range  of  F'  restricted  to  ic(R{)), 
or  are  local  to  m.  From  Lemma  6.7,  the  local  parameters  used  in  the  range  of  this 
extension  are  not  in  Pc(ic(R'^)),  1  <  t  <  n.  Hence,  this  extension  of  F  is  well  defined. 

Hence,  we  have  defined  the  causal  mapping  F  such  that  the  parameters  in  the 
rcinge  of  F  are  either  in  the  range  of  F',  or  are  local  to  m.  □ 

The  above  generalization  of  Lemma  5.3  allows  us  to  generalize  Theorem  5.4  to 
models  with  differential  equations.  Hence,  the  efficient  model  selection  algorithms  of 
Section  5.7  can  be  used  when  all  the  parameters  axe  locally  self-regulating.  Hence, 
we  have  the  following  restriction  on  T: 

•  All  the  parameters  of  I  must  be  locally  self-regulating  as  defined  in  Defini¬ 
tion  6.6. 

6.6.3  Discussion 

Iwasaki  defines  a  closely  related  notion  of  a  self-regulating  equation  [Iwasaki,  1988]. 
In  her  definition,  a  differential  equation  tha*  can  causally  determine  Jp/dt  is  self¬ 
regulating  if  the  equation  cm  also  causally  determine  p.  It  is  easy  to  verify  that  this 
is  just  a  special  case  of  p  being  locally  self-regulating.  In  particular,  the  causal  path 
from  p  to  dp/dt  is  not  mediated  by  any  additional  parameters,  local  or  otherwise.^ 

Not  all  parameters  that  are  encountered  in  modeling  the  physical  world  axe  locally 
self-xegulating.  For  example,  consider  the  Ccise  of  two  objects  connected  by  a  heat 
path.  Figure  6.8  shows  the  three  assumption  classes  that  describe  this  situation. 
Figure  6.9a  shows  the  causal  ordering  generated  from  the  most  accurate  model  of  this 
situation.  Note  that  both  dT^/dt  =  Cif  md  dT^fdt  —  —CiS  cm  be  individually 
equilibrated.  For  example,  Figure  6.9b  shows  the  causal  ordering  resulting  from 
replacing  the  first  equation  by  /  =  0. 


^Hence,  one  can  call  this  iirtci  self- regulation. 
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Thermal  object  1  '..'hermal  object  2  Heat  path  between 


objects  1  and  2 


■  =  ■ 

■  {f‘  =  ■ 

r  {/  =  7i2(r,-T.),- 

exogenous{Ci)} 

exogenous{Ca)} 

exogenous  (7ia)} 

i 

i 

i 

[  {/  =  o-  J 

[  ^/  =  o}  J 

{T,  =  Ta} 

Figure  6.8:  Assumption  classes,  and  their  model  fragments,  describing  the  temper¬ 
ature  of  objects  1  and  2,  and  the  heat  path  between  them.  The  arrows  denote  the 
approximation  relation  between  the  model  fragments. 


(a)  Before  equilibrating  (b)  After  equilibrating 

Figure  6.9:  Causal  ordering  before  and  after  equilibr^^lion 


It  is  ea^y  to  verify  from  Figure  6.9e.  that,  eve®  though  both  Ti  eind  T2  are  self- 
regulating  with  respect  to  this  model,  neither  of  them  is  locally  self-regulating  (/  is  not 
local  to  any  assumption  class).  However,  note  that  both  T\  and  To  are  statically  self¬ 
regulating  with  respect  to  the  most  accurate  model,  so  that  individual  equilibration 
is  guaranteed.  In  practice,  the  differential  equations  that  we  have  encountered  can  be 
individually  equilibrated.  Hence,  even  though  our  efficient  model  selection  algorithm 
is  based  on  the  restriction  that  all  the  parameters  are  locally  self-regulating,  the 
program  described  in  Chapter  8  does  not  place  this  restriction.  Instead,  it  implicitly 
assumes  that  all  the  parameters  are  statically  self-regulating,  and  hence  assumes  that 
they  can  be  equilibrated  if  and  only  if  they  can  be  individually  equilibrated. 
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6.T  Summary 

Ie  this  chapter,  we  generalized  the  results  of  Chapter  5  to  models  with  differential 
equations.  We  started  by  requiring  that  all  model  fragments  be  in  canonical  form. 
This  means  that  all  derivatives  were  required  to  be  local  to  some  model  fragment.  We 
then  discussed  two  important  methods  of  approximating  differential  equations:  exoge- 
nizing  and  equilibrating.  Exogenizing  a  differential  equation  is  equivalent  to  assuming 
that  the  dynamic  behavior  is  much  slower  than  the  time-scale  of  interest.  Equilibrat¬ 
ing  a  differential  equation  is  equivalent  to  assuming  that  the  dynamic  behavior  is 
much  faster  than  the  time-scale  of  interest.  We  introduced  an  updated  definition  of 
a  causal  approximation,  and  used  this  definition  to  generalize  Theorem  5.2  to  models 
with  differential  equations. 

We  then  showed  that,  in  the  worst  case,  a  coherent  model  with  differential  equa¬ 
tions  can  have  an  exponential  number  of  immediate  simplifications.  The  root  of  this 
problem  was  traced  to  the  lack  of  any  restrictions  on  how  differential  equations  could 
be  equilibrated.  We  addressed  this  problem  by  introducing  locally  self- regulating  pa¬ 
rameters.  We  then  showed  that  when  all  the  parameters  are  locally  self-regulating, 
a  generalized  version  of  Lemma  5.3  can  be  proved.  This  generalizes  Theorem  5.4, 
ensuring  that  model  fragments  can  be  approximated  if  and  only  if  they  can  be  indi¬ 
vidually  approximated.  Together  with  the  generalization  of  Theorem  5.2,  this  means 
that  the  efficient  model  selection  techniques  developed  in  Section  5.7  can  be  used  for 
models  with  differential  equations. 


Chapter  7 

Order  of  magnitude  reasoning 


In  Chapter  3  we  said  that  the  behavioral  context  of  a  device  is  its  behavior  at  a 
particular  time,  i.e.,  the  values,  at  that  time,  of  the  parameters  used  to  model  the 
device.  Ideally,  we  would  like  the  behavioral  context  to  refer  to  the  actual  behavior 
of  the  device,  e.g.,  the  values  of  the  parameters  obtained  by  actual  measurements  on 
a  physical  prototype.  However,  since  the  actual  behavior  is  usually  unavailable,  we 
content  ourselves  with  computing  the  behavior  from  the  equations  of  a  device  model. 

Different  techniques  can  be  used  to  generate  the  behavior  from  the  equations  of  a 
device  model.  At  one  extreme,  purely  numerical  techniques  c^ln  be  used  to  solve  a  set 
of  equations  [Press  et  a/.,  1989].  The  advantage  of  such  techniques  is  that  predictions 
can  be  made  with  high  precision.  The  primary  disadvcintage  is  that  such  techniques 
require  exact  numerical  values  for  exogenous  parameters.  Exact  numerical  values 
are  not  always  available,  specially  during  conceptual  design,  making  such  methods 
largely  inapplicable.  At  the  other  extreme,  purely  qualitative  techniques  can  be  used 
for  behavior  generation  [Bobrow,  1984;  Weld  and  de  Kleer,  1990).  The  advantage  of 
such  techniques  is  that  they  work  with  weak  qualitative  information,  e.g.,  signs  of  pa¬ 
rameters,  and  qualitative  functional  relationships.  However,  a  primary  disadvantage 
is  that  the  predictions  lack  the  precision  of  numerical  techniques. 

In  this  chapter  we  discuss  a  novel  order  of  magnitude  reaisoning  technique  for 
generating  the  behavior  from  the  equations  of  a  device  model.  In  this  technique,  the 
order  of  magnitude  of  a  parameter  is  defined  on  a  logarithmic  scale,  and  a  set  of 
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rules  axe  used  to  propagate  orders  of  magnitude  through  equations.  A  novel  feature 
of  the  set  of  propagation  rules  is  that  they  allow  us  to  effectively  handle  non-linear 
simultaneous  equations,  using  linear  programming  in  conjunction  with  backtracking. 
This  technique  has  been  implemented  in  a  program  called  NAPIER.^ 

The  order  of  magnitude  technique  embodied  in  NAPIER  is  at  the  right  level  of 
detail.  On  the  one  hand,  it  does  not  require  exact  numerical  values  for  exogenous 
parameters;  a  more  qualitative  order  of  magnitude  is  enough.  On  the  other  hand, 
unlike  purely  qualitative  techniques,  it  provides  valuable  numerical  information. 

bection  7.1  presents  a  motivating  example  that  has  been  used  by  others  working 
on  order  of  magnitude  reasoning.  Section  7.2  presents  the  basic  order  of  magnitude 
reasoning  technique,  and  Section  7.3  analyzes  its  complexity.  Since  we  show  that 
order  of  magnitude  reasoning  is,  in  general,  intractable.  Section  7.4  develops  and 
empirically  evaluates  an  approximate  reasoning  technique  for  order  of  magnitude 
reasoning.  Finally,  Section  7.5  estimates  the  erroi  introduced  by  some  of  the  order  of 
magnitude  rules  introduced  in  Section  7.2,  and  Section  7.6  discusses  related  work. 


7.1  Motivating  example 

Consider  the  following  example,  previously  discussed  in  [Bennett,  1987;  Raiman, 
1991],  from  the  domain  of  acid-base  chemistry.  An  important  task  in  this  domain 
is  to  find  the  concentration  of  ions  in  a  solution.  The  concentration  of  ions  in 
solution  depends  on  the  dynamic  equilibrium  resulting  from  competing  chemical  re¬ 
actions.  Consider  dissolving  an  acid,  AH,  in  water.  The  two  reversible  reactions  that 
occur,  corresponding  to  the  ionization  of  AH  and  H2O,  are  shown  in  Figure  7.1. 


AH  ^  H^  +  A- 

H2O  ^  H-^+OH- 

Figure  7.1:  Ionization  reactions  that  occur  on  dissolving  AH  in  water 


^John  Napier  (1550-1617),  a  Scottish  nobleman,  is  credited  with  the  first  discovery  of  logarithms. 
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The  equilibrium  concentrations  of  the  three  ions  ,0H~ ,A~)  and  the  acid 
(AH)  are  determined  by  the  equations  shown  in  Figure  7.2.  Square  brackets  denote 
concentrations;  Co  is  the  initial  concentration  of  the  acid;  is  the  ion  product  of 
water;  and  Ka  is  the  ionization  constant  of  the  acid. 


Charge  balance: 

|i/+]  =  [A-]  +  [0H-] 

(7.1) 

Mass  balance; 

C.  =  [A-]  +  [AH\ 

(7.2) 

Acid  ionization  equilibrium: 

K.[AH\  =  [A-][H*\ 

(7.3) 

Water  ionization  equilibrium; 

=  \OH-]\H*] 

(7.4) 

Figure  7.2:  Equilibrium  equations  for  the  ionization  reactions. 

As  has  been  pointed  out  in  [Bennett,  1987;  Raiman,  1991],  solving  this  set  of 
equations  analytically  for  [if'*']  results  in  a  cubic  equation  which  is  difficult  to  solve. 
In  fact,  in  problems  involving  polyprotic  acids,  i.e.,  acids  that  can  yield  more  than 
one  ion,  the  closed  form  solution  for  [H"^]  can  involve  equations  of  degree  five  or 
higher,  making  the  solution  significantly  harder. 

An  alternative  to  the  above  approach  is  to  approximate  the  equations,  and  hence 
simplify  them.  For  example,  a  chemist  might  guess  that  the  acid  is  strong,  so  that 
[A"]  ^  [0H~]  and  [A"]  >  [AH].  This  justifies  reducing  the  first  equation  to  [H'^]  = 
[A~]  and  the  second  equation  to  Ca  =  [A”),  leading  to  a  straightforward  solution. 

The  reasoning  following  the  assumptions  that  [A”]  >  [0H~]  and  [A"]  [AH]  is 
very  nicely  formalized  in  [Raiman,  1991].  But  how  are  these  assumptions  justified? 
In  [Bennett,  1987],  Bennett  suggests  that  such  assumptions  are  justified  by  domain 
specific  inference  rules.  A  much  better,  domain-independent  method  for  justifying 
such  assumptions  is  embodied  in  NAPIER.  NAPIER  can  propagate  the  order  of  mag¬ 
nitude  of  exogenous  parameters  like  Ca,  i^u;,  and  Ka  through  a  set  of  equations  like 
Equations  7. 1-7.4  to  compute  orders  of  magnitudes  of  the  remaining  parameters  like 
[7/+],  [0H~],  [A"],  and  [AH].  The  computed  orders  of  magnitude  of  [A~],[OH~], 
and  [AH]  can  be  used  to  justify  the  above  assumptions  and  simplify  the  equations. 

We  now  describe  the  order  of  magnitude  reasoning  technique  embodied  in  NAPIER. 
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7.2  Order  of  magnitude  reasoning  in  NAPIER 

Order  of  magnitude  reasoning  in  NAPIER  is  a  form  of  interval  reasoning.  The  order 
of  magnitude  of  a  parameter  q  (denoted  om{q))  is  defined  as  follows: 

om{q)  =  [logj  I^IJ  (7.5) 

where  the  base,  6,  of  the  logarithm  is  chosen  to  be  the  smallest  number  that  can  be 
considered  to  be  “much  larger”  than  1.  The  choice  of  b  is  clearly  domain  and  task 
dependcnc.  Here  we  assume  that  b  =  10.  Note  that  the  order  of  magnitude  of  a 
parameter  is  an  integer,  with  om(q)  =  n  being  equivalent  to: 

6"  <  \q\  <  6"+' 

Similarly,  ri]  <  om(q)  <  is  equivalent  to 

<  \q\  <  6”*+^ 

Note  also  that  the  order  of  magnitude  of  a  parameter  is  independent  of  its  sign,  and 
hence  om{q)  =  om{—q).  In  what  follows,  we  assume  that  the  signs  of  all  parameters 
have  been  determined,  to  the  extent  possible,  prior  to  any  reasoning  about  orders  of 
magnitude  using  standard  constraint  satisfaction  techniques.^ 

7.2.1  Inference  rules  in  NAPIER 

Given  the  orders  of  magnitude  of  qi  and  q2,  NAPIER  computes  bounds  on  the  orders 

of  magnitude  of  arithmetic  expressions  involving  qi  and  92  >  using  the  rules  shown  in 

Figure  7.3.  The  rules  for  (91  +  92)  and  {qi  —  qj)  assume  that  qi  and  92  have  the  same 

sign,  so  that  the  magnitudes  of  qi  and  92  ai  i  actually  being  added  or  subtracted, 

respectively.  The  rule  for  (91  ±  92)  is  applicable  to  a  sum  or  difference  of  qi  and  92 

when  the  sign  at  least  one  of  qi  and  92  is  unknown. 

The  rules  for  (91  *92)  and  (91/92)  (rules  1  and  2)  follow  directly  from  Equation  7.5 

and  the  rules  of  interval  arithmetic  [Moore,  1979].  For  example,  if  om(9i)  =  nj  and 

^This  assumption  is  unnecessarily  strong.  For  example,  if  a  and  6  are  positive,  constraint  satis¬ 
faction  alone  is  unable  to  deduce  the  sign  of  a  —  6.  However,  if  oin(a)  >  om{b),  then  a  —  b  can  be 
deduced  to  be  positive. 
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1.  om{qi)  +  om{q2)  <  om{qi  *92)  <  om{qi)  +  om{q2)  +  1 

2.  om{qi)  —  om{q2)  —  1  <  om{qi/q2)  <  om{qi)  —  om{q2) 

a)  om{qi)  <  om{qi  +  92)  <  om{qi)  +  1  if  om{qi)  =  0771(92) 

3.  b)  0777(91  +92)  =  om(qi)  if  0777(91)  >  0777(92) 

c)  0777(91  +  92)  =  0777(92)  if  0777(91)  <  0777(92) 

a)  0777(91  -  92)  <  0777(91)  if  0777(91)  =  0777(92) 

4.  b)  0777(91  —  92)  =  0777(91)  if  0777(91)  >  0777(92) 

c)  0777(91  -  92)  =  0777(92)  if  0777(91)  <  0777(92) 

a)  0777(91  ±  92)  <  0777(91)  +  1  if  0777(91)  =  0777(92) 

5.  b)  0777(91  ±92)  =  0777(91)  if  0777(91)  >  0777(92) 

c)  0777(91  ±  92)  =  0777(92)  if  0777(91)  <  0777(92) 

Figure  7.3:  Rules  for  order  of  magnitude  reasoning.  In  rules  3  and  4,  91  and  92  are 
assumed  to  have  the  same  sign.  Rule  5  assumes  that  the  sign  of  at  least  one  of  91  or 
92  is  unknown. 


0777(92)  =  772,  it  follows  that  6”*  <  |9i|  <  and  6”*  <  I92I  <  Using  interval 

arithmetic,  we  get  <  [91  ♦  92I  <  and  hence  77i  +  772  <  0777(91  *92)  < 

+  ^2  +  1* 

Like  rules  1  and  2,  rules  3a  and  4a  are  also  based  on  Equation  7.5  and  interval 
arithmetic.  Note,  however,  that  these  rules  predict  larger  intervals  for  (91  +  92)  and 
(91  —  92)  than  interval  arithmetic  predicts  under  the  same  restrictions  on  91  and  92. 
For  example,  if  0777(91)  =  0777(92)  =  77,  then  interval  arithmetic  predicts  that  (91  +  92) 
is  bounded  by  26”  and  26”'^^,  while  NAPIER  predicts  the  bounds  6”  and  b'"*'^.  This  is 
a  consequence  of  NAFIER  being  able  to  represent  only  intervals  whose  end  points  are 
integer  powers  of  the  chosen  base.  Further  '  at  rules  3a  and  4a  are  correct  only 
if  the  base  is  greater  than  2.  This  is  reas  able  p  .en  our  heuristic  for  selecting  the 
base  (viz.,  2  is  unlikely  to  be  considered  to  Oc  much  larger”  than  1). 

Unlike  the  rules  discussed  thus  far,  rules  3b,  3c,  4b,  and  4c  are  not  guaranteed  to 
be  correct,  but  are  heuristic  rules.  They  are  all  based  on  the  intuition  that  adding 
or  subtracting  a  “small”  parameter  from  a  “large”  parameter  does  not  significantly 
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affect  the  larger  parameter.  Since  the  base  in  Equation  7.5  is  chosen  as  the  small¬ 
est  number  that  can  be  considered  to  be  “much  larger”  than  1,  the  above  intuition 
justifies  these  rules;  the  order  of  magnitude  of  a  parameter  is  not  affected  by  adding 
or  subtracting  parameters  of  a  smaller  order  of  magnitude.  The  inclusion  of  these 
heuristic  order  of  magnitude  rules  differentiates  NAPIER  from  standard  interval  rea- 
soners.  In  section  7.5,  we  estimate  the  error  introduced  by  the  use  of  these  heuristic 
rules. 

Finally,  rule  set  5  merely  encompasses  both  rule  sets  3  and  4.  It  is  used  to  infer 
the  order  of  magnitude  of  a  sum  or  difference  of  two  parameters  when  the  signs  of  at 
least  one  of  the  two  parameters  is  not  known.  To  determine  the  order  of  magnitude  of 
a  sum  or  difference  of  two  parameters,  NAPIER  selects  the  appropriate  rule  set  from 
rule  sets  3,  4,  and  5,  depending  on  the  operation  (sum  or  difference)  and  the  signs  of 
the  two  parameters.  For  example,  consider  the  equation  93  =  91  -f  92-  If  9i  and  92 
have  the  same  sign,  then  rule  set  3  is  used  to  infer  001(93);  if  91  and  92  have  opposite 
signs,  then  rule  set  4  is  used  to  infer  0771(93),  since  then  the  magnitude  of  93  is  really 
the  difference  of  the  magnitudes  of  91  and  92;  and  if  the  signs  of  at  least  one  of  91  and 
92  is  unknown,  then  rule  set  5  is  used  to  infer  0771(93). 


7.2.2  S  et  of  simultaneous  equations 

Until  now,  we  have  focussed  exclusively  on  how  NAPIER  uses  a  single  equation  to 
propagate  orders  of  magnitudes,  i.e.,  how  0771(91  op  92)  is  computed  from  0771(91) 
and  0777(92).  However,  the  rules  in  Figure  7.3  can  also  be  used  to  compute  orders 
of  magnitudes  of  parameters  related  by  a  set  of  (possibly  non-linear)  simultaneous 
equations.  NAPIER  uses  these  rules  to  convert  a  set  of  simultaneous  equations  into  a 
set  of  constraints,  where  each  constraint  is  a  disjunction  of  a  set  of  linear  inequalities. 
Each  equation  in  the  set  of  simultaneous  equations  contributes  a  constraint  as  follows: 


1.  Product  and  quotient  terms  contribute  a  single  set  of  linear  inequalities  accord¬ 
ing  to  rules  1  and  2,  respectively.  For  example,  93  =  91  *  92  contributes  the 
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following  set: 

{om(9i)  +  om(92)  <  001(93), 

070(93)  <  5m(9i)  +  070(92)  +  1} 

2.  Sum  and  difference  terms  contribute  a  disjunction  of  three  sets  of  lineair  inequal¬ 
ities,  using  rule  sets  3,  4,  or  5,  as  applicable.  Each  disjunct  corresponds  to  one 
of  the  rules  (a,  b,  or  c)  in  the  applicable  rule  set.  For  example,  assuming  that 
9i  and  92  have  the  same  sign,  the  equation  93  =  91  —  92  contributes  the  following 
disjunction:^ 


{070(93)  <  070(91), 070(91)  =  070(92)} 

V 

{070(93)  =  001(91),  070(91)  >  070(92)  -1- 1} 

V 

{070(93)  =  om(92),  O7o(9i)  <  070(92)  -  1} 

corresponding  to  rules  4a,  4b,  and  4c,  respectively. 

NAPIER  uses  this  set  of  constraints  to  compute  bounds  on  the  orders  of  magnitudes 
of  the  parameters.  Since  all  the  inequalities  in  the  constraints  are  linear  inequalities, 
NAPIER  uses  linear  programming  [Hillier  and  Lieberman,  1980],  in  conjunction  with 
backtracking,  to  compute  order  of  magnitude  bounds.  Backtracking  is  necessary  to 
handle  the  disjunctions.  We  describe  this  algorithm  next. 

7.2.3  Backtracking  algorithm 

Let  E  denote  the  set  of  simultaneous  equations  being  processed.  NAPIER’s  backtrack¬ 
ing  procedure  is  best  visualized  as  a  depth-first  traversal  of  a  backtrack  tree.  Each 
level  in  the  tree  (except  the  root  level)  corresponds  to  one  of  the  sum  or  difference 
terms  in  E.  The  root  level  corresponds  to  all  the  product  and  quotient  terms  in 
E.  Each  internal  node  has  three  children,  corresponding  to  the  three  disjuncts  in 

^Since  the  order  of  magnitudes  are  integral,  om{qi)  >  om{q2)  is  equivalent  to  om{qi)  >  0771(92)  + 
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int  =  Ka[AH] 
int  =  [A-][H+] 

=  [OH-][H+] 

+]  =  [>l-]  +  [0H-] 

Ca  =  [A-]  +  [AH] 

3a  3b  3c  3a  3b  3c  3a  3b  3c 
Figure  7.4:  A  backtrack  tree. 

the  constraint  contributed  by  the  sum  or  difference  term  at  the  level  of  the  node’s 
children.  Each  node  in  the  tree  hzis  an  associated  set  of  linear  inequalities  defined  as 
follows: 

1.  The  set  of  inequalities  at  the  root  node  consists  of  the  union  of  the  sets  of 
inequalities  contributed  by  each  product  and  quotient  term  in  E. 

2.  The  set  of  inequalities  at  each  non-root  node  consists  of  the  union  of  (a)  the 
inequalities  at  the  node’s  parent;  and  (b)  the  inequalities  in  the  disjunct  asso¬ 
ciated  with  that  node. 

Starting  at  the  root  node,  NAPIER  traverses  the  backtrack  tree  in  a  depth-first 
manner.  At  each  node  it  checks  the  consistency  of  the  inequalities  at  that  node.  If 
the  set  is  inconsistent,  it  immediately  backtracks  to  the  node’s  parent.  If  the  set  is 
consistent  and  it  is  a  non-leaf  node,  it  continues  its  depth-first  traversal.  If  the  set 
is  consistent  and  it  is  a  leaf  node,  it  uses  the  inequalities  to  find  the  maximum  and 
minimum  values  of  the  order  of  magnitude  of  eaw:h  parameter.  The  bounds  computed 
at  each  of  the  consistent  leaf  nodes  are  combined  so  that  the  lower  bound  of  each 
parameter  is  the  least  lower  bound  and  the  upper  bound  is  the  greatest  upper  bound. 

Since  the  inequalities  at  each  node  are  linear,  NAPIER  uses  the  Simplex  linear  pro¬ 
gramming  algorithm  [Hillier  and  Lieberman,  1980;  Press  et  al.,  1989]  to  check  their 
consistency,  and  to  compute  the  order  of  magnitude  bounds  at  leaf  nodes.  However, 
from  Equation  7.5  it  follows  that  the  order  of  magnitude  of  a  parameter  is  integral. 
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Hence,  instead  of  using  linear  programming,  NAPIER  should  use  integer  programming 
[Hillier  and  Lieberman,  1980].  Unfortunately,  integer  programming  is  known  to  be 
intractable  [Karp,  1972],  which  leads  to  severe  restrictions  on  the  number  of  equa¬ 
tions  and  the  size  of  the  backtrack  tree  that  can  be  handled.  Hence,  to  avoid  such 
restrictions,  NAPIER  uses  linear  programming. 

It  is  important  to  note  that,  while  bounds  computed  by  linear  progrcimming  are 
not  guaranteed  to  be  tight,'*  they  are  guaramteed  to  be  correct:  upper  bounds  will  be 
greater  than  or  equal  to  integer  programming  upper  bounds,  and  lower  bounds  will 
be  less  than  or  equal  integer  programming  lower  bounds.  In  addition,  we  have  found 
that,  in  practice,  linear  programming  bounds  are  usually  integral,  in  which  case  there 
is  no  loss  of  solution  quality. 

7.2.4  Example 

We  now  illustrate  the  above  procedure  using  Equations  7. 1-7.4.  Let  us  assume 
that  the  exogenous  orders  of  magnitude  are  as  follows:  OTn{Kw)  =  —l4,OTn{Ka)  = 
—2,  om{Ca)  —  —5.  This  corresponds  to  a  moderately  strong  solution  of  a  strong 
acid.  The  backtrack  tree  resulting  from  these  equations  is  shown  in  Figure  7.4.  The 
equations  associated  with  each  level  are  shown  on  the  left  of  the  tree.  Note  that 
Equation  7.3  had  to  be  split  into  two  product  terms,  with  the  introduction  of  an  in¬ 
termediate  variable  int.  The  rules  associated  with  each  non-root  node  are  displayed 
near  each  node.  Nodes  that  are  filled  in  are  the  inconsistent  nodes.  For  example,  the 
left  most  leaf  node  can  be  seen  to  be  inconsistent  using  the  following  line  of  reasoning. 
Applying  rule  3a  to  Equations  7.1  and  7.2,  we  get 

om{[OH~])  =  om([A~])  =  om{[AH]) 
om([A“])  <  om(Ca)  <  om([/l“])  +  1 
om([i4“])  <  om{[H^])  <  om([A“])  -f  1 

Since  om{Ca)  =  —5,  it  follows  that  the  least  value  of  om([A~])  is  —6.  Hence  the 
least  values  of  om{[OH~])  and  om([H'^])  are  also  —6,  and  hence  the  least  value  of 

^A  bound  6i,  interpreted  as  an  interval,  is  said  to  be  tight  with  respect  to  a  bound  62  if  ^1  and 
62  are  identical.  61  is  looser  than  62  if  contains  62. 
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om{[OH~][H'^])  is  —12.  But  rule  1  applied  to  Equation  7.4  requires  that: 

om([0/7-][/7+])  =  om{K^)  =  -14 
which  leads  to  a  contradiction. 

Of  course,  NAPIER  doesn’t  need  the  above  line  of  reasoning  to  infer  inconsistencies; 
it  reaches  the  same  conclusion  using  linear  programming. 

The  only  consistent  set  of  inequalities  at  the  leaf  nodes  is  the  middle  most  leaf 
node,  corresponding  to  assuming  that  om((i4~])  >  om{[OH~])  and  om([i4“])  > 
om{[AH]).  The  parameter  bounds  calculated  at  this  node  are  as  follows:® 

om([/f+])  =  -5;  om{[OH-])  =  (-10,  -9);  om([AH])  =  (-9,  -7);  om([yl-])  =  -5 

Since  [.4“]  is  at  least  two  orders  of  magnitude  greater  them  [AH],  and  at  least  four 
orders  of  magnitude  greater  than  [OH~],  a  chemist  is  justified  in  making  the  assump¬ 
tions  that  [A“]  [OII~]  and  [A”]  >  [^4/7].  These  assumptions  can  then  be  used  to 
simplify  the  equations,  as  discussed  earlier. 

A  slight  variation  of  the  above  example  illustrates  the  importance  of  having  such 
justifications.  Suppose  that,  instead  of  having  om{Ca)  =  -5,  we  had  om{Ca)  =  —8. 
This  corresponds  to  a  weak  solution  of  the  same  strong  acid.  Using  this  new  value 
for  om(C'a),  NAPIER  predicts  the  following  bounds  on  the  orders  of  magnitude: 

om{[H+])  =  -7;  om{[OH-])  =  (-8,  -7);  om{[AH])  =  (-14,  -12);  om([A-])  =  -8 

These  values  justify  the  assumption  that  [A“]  »  [AH],  but  the  other  assumption, 
[A~]  [OH~],  is  seen  to  be  completely  unjustified.  This  means  that  only  Equa¬ 

tion  7.2  can  be  simplified.  Hence,  NAPIER  is  a  useful  tool  in  justifying  the  order  of 
magnitude  assumptions  that  scientists  and  engineers  make  in  simplifying  equations. 

In  addition  to  its  role  in  justifying  order  of  magnitude  eissumptions,  NAPIER’s 
predictions  can  also  be  used  directly.  For  example,  if  all  the  chemist  is  interested  in  is 
the  approximate  pH  of  the  solution,®  then  NAPIER’s  predictions  can  be  used  directly: 
in  the  first  case,  the  pH  is  between  5  and  4;  in  the  second  case,  the  pH  is  between 

^om{q)  =  {l,u)  represents  the  fact  that  I  <  om{q)  <  u 

®The  pH  of  a  solution  is  defined  to  be  -  logiof/f'*']. 
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7  and  6.  Note  that  NAPIER  was  able  to  make  these  predictions  using  approximate 
values  of  Ca,  Kw,  and  Ka.  This  feature  makes  it  particularly  useful  dming  conceptual 
design. 


7.3  Order  of  magnitude  reasoning  is  intractable 

The  backtracking  algorithm  described  in  the  previous  section,  generates  a  tree  whose 
worst  case  size  is  exponential  in  the  number  of  sum  and  difference  expressions.  In 
this  section  we  show  that  order  of  magnitude  recisoning  using  the  rules  in  Figure  7.3 
is  intractable.  Clearly,  one  source  of  intractability  is  that  the  order  of  magnitude  of 
a  parameter  is  integral,  so  that  consistency  checks  and  bounds  computations  require 
integer  programming.  However,  we  now  show  that  order  of  magnitude  reasoning 
remains  intractable  even  if  orders  of  magnitude  are  not  required  to  be  integral.  An 
immediate  consequence  of  this  result  is  that  NAPIER  can  do  little  better  than  generate 
a  backtrack  tree  whose  worst  case  size  is  exponential. 

We  start  by  defining  the  decision  problem  corresponding  to  finding  the  maximum 
order  of  magnitude  of  a  parameter: 

Definition  7.1  ("ORDER  OF  MAGNITUDE  REASONING^  Let  E  be  a  set  of  equations, 
and  let  V  be  the  set  of  parameters  used  in  E.  Let  X  C  V  be  the  set  of  exogenous 
parameters,  with  known  orders  of  magnitude.  Let  q  ^  V  be  a  parameter  and  let  B 
be  an  integer.  Let  s  :  V  ^  unknown}  be  a  function  that  assigns  signs  to 

the  parameters  in  V.  (Parameters  with  unknown  signs  are  assigned  “unknown.”) 
Assuming  that  the  order  of  magnitude  of  a  parameter  is  not  required  to  be  integral, 
is  the  maximum  value  of  om{q),  derived  using  the  rules  in  Figure  7.3  on  the  set  E, 
greater  than  or  equal  to  B? 

We  now  show  that  the  above  problem  is  NP-complete.  The  proof  of  this  theorem 
is  based  on  a  reduction  from  an  arbitrary  instance  of  3SAT.  Briefly,  the  reduction 
introduces  a  parameter  for  each  litereil  in  the  instance  of  3SAT.  Equations  are  added 
to  ensure  that  pcirameters  corresponding  tc  complementary  literals  have  the  property 
that  the  order  of  magnitude  of  one  of  them  must  be  0  and  the  order  of  magnitude 
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of  the  other  one  must  be  1.  The  mapping  between  truth  assignments  and  orders  of 
magnitudes  is  straightforward:  a  literal  is  true  if  and  only  if  the  corresponding  param¬ 
eter’s  order  of  magnitude  is  1.  Additional  equations  involving  the  above  parameters 
and  the  special  parameter  q  are  then  introduced,  and  the  bound  B  is  defined  to  ensure 
that  the  maximum  value  of  om{q)  is  greater  than  or  equal  to  B  if  and  only  if  all  the 
clauses  are  satisfied.  The  details  of  this  proof  are  as  follows: 

Theorem  7.1  The  ORDER  OF  MAGNITUDE  REASONING  problem  is  NP-complete. 

Proof:  It  is  easy  to  see  that  the  ORDER  OF  MAGNITUDE  REASONING  problem  is  in 
NP  since  a  non-deterministic  algorithm  can  proceed  by  (a)  for  each  sum  and  difference 
term  in  £,  guessing  a  rule  (a,  b,  or  c)  from  rule  sets  3,  4,  or  5,  as  applicable;  and 
(b)  use  linear  programming  on  the  resulting  set  of  inequalities  to  see  if  the  maximum 
value  of  om{q)  exceeds  B.  Since  linear  programming  is  known  to  be  in  P  [Khachian, 
1979],  it  follows  that  the  ORDER  OF  MAGNITUDE  REASONING  problem  is  in  NP. 

To  show  that  the  ORDER  OF  MAGNITUDE  REASONING  problem  is  NP-hard,  we 
reduce  an  arbitrary  instance  of  3SAT  to  an  instance  of  the  ORDER  OF  MAGNITUDE 
REASONING  problem.  Let  Ji  be  an  arbitrary  instance  of  3SAT  consisting  of  a  set  17  = 
{ui, . . . ,  u„}  of  boolean  variables,  and  a  set  C  =  {ci, . . . ,  c,„}  of  three  literal  clauses. 
We  now  reduce  Ji  to  an  instance,  Jj,  of  the  ORDER  OF  MAGNITUDE  REASONING 
problem. 

For  each  boolean  variable  Ui  E  U ,  1  <  i  <  n,  add  the  following  6  equations  to  E, 
and  the  corresponding  parameters  to  V: 


Vi 

=  Xii  *  Xi2 

(7.6) 

Vi 

=  Xli  *  Xi2 

(7.7) 

yn 

=  Vi*  Vi 

(7.8) 

Vi 

—  yn  +  yi7 

(7.9) 

Zi 

=  Vi  -  Vi 

(7.10) 

2i 

=  (^<1  +  Zi2)  *  Zi3 

(7.11) 

Add  Xii,a:,2,Xji,x~j2,2/<,2ii5  and  Ziz  to  the  set  X  of  exogenous  parameters.  Define  the 
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orders  of  magnitudes  of  these  parameters  as  follows: 

om(i,i)  =  0Tn(xi2)  =  om(xii)  =  om(x~i2)  -  om{zi3)  =  0  (7.12) 

om{yi)  =  om{zii)  =  1  (7.13) 

For  each  clause  cj  €  C,  I  <  j  <  m,  with  literals  /jj,  lj2,  and  /j3,  add  the  equation 

{{{Oj  +  fj\)  +  fj2)  +  fjz)  =  hj  (7.14) 

where  the  parameter  fjk  is  u,-  if  Ijk  is  u,-,  and  u,  if  Ijk  is  tl,-,  for  some  1  <  i  <  n.  Add 
gj  and  hj  to  V.  Add  gj  to  and  define  its  order  of  magnitude  as  follows: 

om{gj)  =  1  (7.15) 

Add  the  following  equation  to  E: 

hi  *  h2*  ...*  hm  =  q  (7.16) 

Let  s  be  such  that  aM  the  parameters  in  V,  except  a,-  and  z,-3  (1  <  i  <  n),  are 
positive,  and  let  the  signs  of  z,  and  z,-3  be  unknown.  Let  B  be  3m  —  1. 


That  completes  the  reduction.  Clearly,  it  can  be  done  in  polynomial  time.  We 
now  show  that  any  assignment  of  orders  of  magnitudes  to  the  parameters  of  I2  that 
satisfies  Equations  7.6-7.13,  according  to  the  rules  in  Figure  7.3,  assigns  the  order  of 
magnitude  1  to  exactly  one  of  V{  and  u,-,  for  each  I  <i  <n,  and  0  to  the  other. 

From  rule  1  applied  to  Equation  7.6  we  have: 

om{xii)  +  om{xi2)  <  OTn{vi)  <  om(x,i)  +  OTn{xi2)  +  1  (7.17) 

Substituting  the  orders  of  magnitudes  of  x,i  eind  x,-2  (Equation  7.12)  into  the  above 
equation,  we  have 

0  <  om{vi)  <  1  (7.18) 

In  a  similar  way,  rule  1  applied  to  Equation  7.7  implies  that: 


0  <  om{vi)  <  1 


(7.19) 
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Hence,  om(v{)  and  om(vi)  are  either  0  or  1.  Applying  rule  1  to  Equation  7.8  leads  to: 

om(v{)  +  om(ir,)  <  om(y{i)  <  om(vi)  +  om(u,)  +  1  (7.20) 

Equation  7.9  implies  that  om(j/,)  >  om(y,i).  This  fact  follows  from  the  observation 
that  in  rule  3: 

om(9i  +  92)  >  9i  (7.21) 

om(9i  +  92)  >  92  (7.22) 

under  all  three  conditions.  Since  om(y,)  =  I  (Equation  7.13),  it  follows  that 

om(yii)  <  1  (7.23) 

Hence,  from  Equations  7.20  and  7.23  it  follows  that: 

om(vi)  +  om(t;,)  <  1  (7.24) 

Hence,  om(i;,)  and  om(i;,)  cannot  both  be  1. 

Now,  since  om(2,i)  =  1  (Equation  7.13),  it  follows  from  Equation  7.21  that 

om(zii  +  2i2)  >  1  (7.25) 

Rule  1  applied  to  Equation  7.11  leads  to: 

om(zii  +  2, 2)  +  om(zi3)  <  om(z{)  (7.26) 

Hence,  from  Equations  7.25  and  7.12,  it  follows  that: 

om(zi)  >  1  (7.27) 

Now,  rule  4  implies  that 

om(gi  —  92)  <  maximum{om(qi),  om(q2)}  (7.28) 

Hence,  from  Equations  7.10,  7.18,  7.19,  and  7.28,  it  follows  that  at  least  one  of  om(u,) 
and  om(vi)  must  be  1.  But  earlier  we  had  inferred  that  at  most  one  of  om(u,)  and 
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om(vi)  can  be  1.  Hence,  exactly  one  of  om(vi)  and  om(vi)  can  be  1,  with  the  other 
being  0. 

We  now  show  that  Ji  has  a  satisfying  truth  assignment  if  ajid  only  if  Xj  is  such 
that  the  maximum  value  of  om{q)  is  greater  than  or  equal  to  B. 

(=>)  Let  Ji  have  a  satisfying  truth  «issignment.  We  now  assign  order  of  magnitudes 
to  the  parameters  of  Xj  such  that  Equations  7.6-7.16  are  satisfied  according  to  the 
rules  of  Figure  7.3.  For  1  <  z  <  n,  if  itj  is  true,  then  let  om(u,)  be  1  and  om(ui) 
be  0;  otherwise  let  om(u,)  be  0  and  om(vi)  be  1.  Since  exactly  one  of  om(uj)  and 
om(vi)  is  1  and  the  other  is  0,  this  <issignment  of  orders  of  magnitude  will  satisfy 
Equations  7.6-7.13. 

Since  the  truth  assignment  satisfies  every  clause  cj  =  {^ji, 1  ^  j  ^  Tn, 
it  follows  that  at  letist  one  of  or  Ij^  is  true.  Hence,  at  least  one  of  /ji,/j2, 

or  /j3  has  an  order  of  magnitude  of  1.  Since  om{gj)  =  1  (Equation  7.15),  it  follows 
from  Equation  7.14  and  rule  3  that  the  maximum  value  of  OTn{hj)  is  2.  Hence,  from 
Equation  7.16  and  rule  1,  it  follows  that  the  maximum  value  of  9  is  3m  —  1  (2m  from 
each  of  the  om(/ij),  and  m  —  1  from  the  product  of  m  factors).  Hence,  the  maximum 
value  of  om{q)  is  greater  than  or  equal  to  B. 

(•<=)  Let  us  now  assume  that  the  maximum  value  of  om{q)  is  greater  than  or  equal 
to  B.  Consider  the  assignment  of  order  of  magnitudes  to  parameters  that  supports 
om{q)  taking  on  its  maximum  value.  For  each  vairiable  Ui,  1  <  z  <  n,  let  Ui  be  true  if 
and  only  the  order  of  magnitude  of  parameter  u,  is  1  in  the  above  assignment.  This 
gives  us  a  well  defined  truth  assignment  since  we  have  eilready  shown  that  exactly  one 
of  om(ui)  and  om(vi)  is  1.  To  show  that  this  truth  eissignment  satisfies  very  clause, 
we  proceed  as  follows. 

Since  each  fjk(I  <  j  <  m,  1  <  A:  <  3)  is  either  Vi  or  z7,,  for  some  z.  Equations  7.18 
and  7.19  tell  us  that  the  maximum  value  of  om(fjk)  is  1.  Hence,  using  Equation  7.14 
and  rule  3,  the  maximum  value  of  om(hj)  can  be  2.  However,  for  the  maximum  value 
of  om{q)  to  be  greater  than  or  equal  to  B{=  3m  —  1),  it  follows  that  om{hj)  must 
be  2.  Hence,  at  least  one  of  the  fjk  must  have  an  order  of  magnitude  of  1.  Hence,  at 
least  one  of  the  Ijk  will  be  true,  and  hence  the  truth  assignment  constructed  above 
will  satisfy  each  clause. 
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Hence,  we  have  shown  that  has  a  satisfying  truth  assignment  if  and  only  if  I2 
is  such  that  the  maximum  value  of  om{q)  is  greater  than  or  equal  to  B.  Hence,  the 
ORDER  OF  MAGNITUDE  REASONING  problem  is  NP-hard.  □ 

The  intractability  of  the  ORDER  OF  MAGNITUDE  REASONING  problem  tells  us 
that,  in  the  worst  case,  NAPIER  will  have  to  generate  a  backtrack  tree  whose  size  is 
exponential  in  the  number  of  sum  and  difference  terms.  Unfortunately,  the  exponen¬ 
tial  blow  up  does  occur  in  practice.  Table  7.1  summarizes  NAPIER’s  performance  on 
models  of  ten  different  devices  (these  devices  are  described  in  Appendix  B). 


Device 

Total 

#of 

equations 

#of 

+/- 

terms 

Time  (sec) 

All 

equations 

With  causal 
ordering 

Bimetallic  strip  temperature  gauge 

28 

11 

2733 

2.0 

Bimetallic  strip  thermostat 

31 

11 

2435 

1.0 

Flexible  link  temperature  gauge 

45 

14 

- 

2.9 

Electromagnetic  relay  thermostat 

60 

24 

- 

2.7 

Galvanometer  temperature  gauge 

80 

25 

- 

37.2 

Electric  bell 

110 

32 

- 

35.9 

Magnetic  sizing  device 

111 

32 

- 

94.6 

Carbon  pile  regulator 

119 

35 

- 

20.4 

Tachometer 

145 

43 

- 

45.2 

Car  distributor  system 

163 

50 

- 

21.0 

Table  7.1:  NAPIER’s  run  times  on  an  Explorer  II,  with  and  without  caused  ordering. 


The  second  column  in  this  table  shows  the  total  number  of  equations  in  each 
example,  while  the  third  column  ows  the  the  total  number  of  sum  and  difference 
terms.  The  fourth  column  shows  the  time  it  took  NAPIER  to  run  its  backtracking 
algorithm  on  the  complete  set  of  equations.  (The  fifth  column  will  be  discussed  in 
the  next  section.)  NAPIER  was  given  a  maximum  of  one  hour  to  solve  each  example; 
a  entry  in  column  four  denotes  that  NAPIER  could  not  solve  the  example  in  an 
hour.  As  is  clear  from  the  table,  only  the  two  smallest  examples  could  be  solved  in 
under  an  hour,  each  taking  over  40  minutes.  Hence,  NAPIER  appears  to  be  quite 
impractical,  except  for  the  smallest  examples.  To  make  it  practical,  we  now  develop 
an  approximate  reasoning  scheme  for  NAPIER  that  trades  off  accuracy  for  speed. 
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7.4  Approximation  algorithms  in  NAPIER 

The  backtrack  tree  developed  by  NAPIER  is,  in  the  worst  case,  exponential  in  the 
number  of  sum  and  difference  terms  in  the  set  of  equations  under  consideration. 
Hence,  to  make  NAPIER  practically  useful,  it  is  important  to  decrease  the  number  of 
sum  and  difference  terms  that  are  handled  at  any  one  time.  We  now  discuss  a  method 
for  doing  this,  based  on  a  dependency  ordering  of  the  equations. 


7.4.1  Ordering  the  equations 

The  dependency  ordering  of  equations  that  we  consider  is  the  causal  ordering,  de¬ 
scribed  in  Chapter  3.  The  causal  ordering  specifies  tho  order  in  which  equations  are 
to  be  solved,  and  identifies  minimal  sets  of  equations  that  must  be  solved  simultane¬ 
ously.  The  causal  ordering  can  be  viewed  as  a  directed  acyclic  graph.  Each  node  in 
the  graph  consists  of  a  minimal  set  of  equations  that  must  be  solved  simultaneously. 
There  is  an  edge  from  node  ni  to  node  n2  if  the  equations  at  nj  use  a  parameter 
whose  value  is  determined  by  the  equations  at  n\. 

NAPIER  processes  the  equation  sets  in  the  order  specified  by  the  causal  ordering: 
equation  sets  earlier  in  the  ordering  are  processed  first.  NAPIER  bounds  the  orders 
of  magnitudes  of  the  parameters  used  in  an  equation  set,  and  uses  these  bounds  as 
exogenous  bounds  for  equation  sets  later  in  the  ordering. 

The  use  of  the  above  dependency  ordering  has  a  significant  computational  ad¬ 
vantage.  A  large  set  of  equations,  with  many  sum  and  difference  terms,  can  often 
be  broken  down  into  many  small  sets  of  equations,  with  each  equation  set  having 
very  few  sum  and  difference  terms.  Hence,  NAPIER  can  process  each  equation  set  in 
the  dependency  ordering  very  fast.  Column  five  in  Table  7.1  shows  the  time  it  took 
NAPIER  to  solve  the  ten  examples  using  causal  ordering.  It  takes  NAPIER  from  a  few 
seconds  to  under  two  minutes  to  solve  each  of  these  examples,  showing  that  causal 
ordering  has  made  NAPIER  practical  for  large  sets  of  equations. 
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7.4.2  Loss  of  accuracy 

The  drawback  of  using  the  dependency  ordering  is  that  global  constraints  can  be 
lost,  leading  to  excessively  loose  bounds  on  the  orders  of  magnitudes.  Consider,  for 
example,  the  set  {j/i  =  xj  *  X2,y2  =  xzlyx,yz  =  j/i  *  y2},  and  let  a:i,X2,  and  X3  be 
exogenous  with  orders  of  magnitude  0.  The  dependency  ordering  generated  from  this 
set  of  equations  is: 

{y2  =  xa/yi} 

/  \ 

{yi  =  xi*  X2}  — >  {yz  =  yi*  ^2} 

Using  this  dependency  ordering,  NAPIER  computes  the  order  of  magnitude  of  yz  as 
follows:  from  the  first  equation  it  computes  om(yi)  to  be  between  0  and  1;  from 
the  second  equation,  and  the  calculated  bound  on  om(yi),  it  computes  om(j/2)  to  be 
between  —2  and  0;  and  from  the  third  equation  zind  the  calculated  bounds  on  om(yi) 
and  om(y2),  it  computes  om{yz)  to  be  between  -2  and  2.  However,  if  all  three 
equations  were  considered  simultaneously,  NAPIER  computes  om(y3)  to  be  between 
—1  and  1. 

The  reason  for  the  looser  bound  in  the  first  case  stems  from  not  enforcing  some 
global  constraints.  For  example,  the  lower  bound  of  om(yz)  can  be  -  2  only  when 
om(yi)  =  0  and  om(y2)  =  —2.  However,  when  om(yi)  is  0,  the  second  equation 
dictates  that  the  lowest  that  om(y2)  can  be  is  —1.  This  fact  is  lost  when  the  third 
equation  is  processed  by  itself. 

More  generally,  the  above  problem  occurs  when  a  parameter,  like  yz,  depends 
on  two  or  more  parameters,  like  yi  and  yz,  whose  values  have  been  determined  by 
equations  that  are  earlier  in  the  causal  ordering.  In  using  -these  previously  determined 
values,  NAPIER  disregards  any  additional  constraints  that  might  hold  between  those 
values.  Hence,  bounds  computed  beised  on  th  ^se  values  may  not  be  as  tight  as  possible. 

NAPIER  can  partially  address  this  problem  by  combining  adjacent  sets  of  equations 
in  the  dependency  ordering.  This  allows  more  equations  to  be  handled  simultaneously, 
so  that  more  global  constraints  can  be  incorporated.  However,  combining  adjacent 
sets  of  equations  can  lead  to  an  increase  in  the  number  of  sum  and  difference  terms 
that  must  be  handled  simultaneously.  Hence,  adjacent  sets  are  combined  only  when 
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the  number  of  sum  and  difference  terms  in  the  resulting  set  does  not  increcise  beyond 
a  threshold  (call  this  threshold  A). 


Device 

Bimetallic  strip  temperature  gauge 

11 

Bimetallic  strip  thermostat 

11 

Flexible  link  temperature  gauge 

10 

Electromagnetic  relay  thermostat 

9 

Galvanometer  temperature  gauge 

9 

Electric  bell 

12 

Magnetic  sizing  device 

7 

Carbon  pile  regulator 

9 

Tachometer 

7 

Car  distributor  system 

9 

Table  7.2:  Maximum  value  of  A  for  each  example. 


Combining  adjacent  sets  of  equations,  as  described  above,  also  allows  us  to  par¬ 
tially  empirically  evaluate  the  effect  of  causal  ordering  on  accuracy.  We  ran  NAPIER 
a  number  of  times  on  each  of  our  examples,  using  increasing  values  of  A,  allowing  a 
maximum  of  one  hour  per  run.  Table  7.2  shows  the  meocimum  value  of  A  used  for 
each  example.  We  then  compared  the  bounds  that  were  computed  without  combining 
adjacent  sets  with  the  bounds  that  were  computed  with  the  maximum  setting  of  A. 
Interestingly,  we  found  that  there  was  no  loss  of  accuracy — the  bounds  computed 
with  and  without  combining  adjacent  sets  were  identical. 

To  understand  the  reason  for  this  somewhat  surprising  result,  we  now  analyze  the 
source  of  the  additional  constraints  on  previously  determined  values.  Let  us  cissume 
that  om(p3)  is  computed  using  previously  computed  values  of  om(pi)  and  om(p2). 
Additional  constraints  on  the  values  of  om(pi)  and  om(p2)  stem  from  one  of  two 
sources:  (a)  om(pi)  and  OTn{p2)  are  determined  simultaneously;  and  (b)  the  value 
of  om(pi)  is  used  in  computing  the  value  of  om(p2),  i.e.,  the  values  of  one  of  these 
parameters  depends  on  the  value  of  the  other.  Point  (a)  manifests  itself  as  a  node  in 
the  causal  ordering  which  contains  more  than  one  equation.  Point  (b)  manifests  itself 
as  multiple  paths  between  two  nodes  in  the  causal  ordering. 
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Hence,  if  the  causal  ordering,  viewed  as  a  graph,  satisfies  the  following  two  prop¬ 
erties: 


1.  each  node  contains  exactly  one  equation;  and 


2.  there  is  at  most  one  path  between  any  two  nodes; 


then  we  can  show  that  there  will  be  no  additional  constraints  between  previously 
determined  values.  Hence,  there  is  no  loss  of  accuracy  in  using  the  causal  ordering. 


Device 

Equations 

per  node 

#  of  extra 
edges 

Maximum 

Average 

Bimetallic  strip  temperature  gauge 

7 

1.27 

1 

Bimetallic  strip  thermostat 

7 

1.24 

0 

Flexible  link  temperature  gauge 

7 

1.15 

1 

Electromagnetic  relay  thermostat 

1 

1.00 

0 

Galvanometer  temperature  gauge 

12 

1.29 

1 

Electric  bell 

18 

1.29 

2 

Magnetic  sizing  device 

17 

1.26 

6 

Carbon  pile  regulator 

9 

1.25 

2 

Tachometer 

18 

1.21 

3 

Car  distributor  system 

16 

1.10 

0 

Table  7.3:  Properties  of  the  causal  ordering  graph 


Table  7.3  shows  how  closely  the  causal  orderings  generated  from  our  examples 
match  the  above  two  properties.  The  second  and  third  columns  of  this  table  show 
the  maximum  and  average  number  of  equations  per  node,  respectively.  One  can  see 
that,  in  all  cases,  the  average  number  of  equations  per  node  is  very  close  to  1.  The 
fourth  column  shows  the  minimum  number  of  edges  that  must  be  removed  from  the 
causal  ordering  to  ensure  that  there  is  at  most  one  path  between  any  two  nodes.  One 
can  see  that,  in  most  cases  these  numbers  are  very  small.  Hence,  the  above  analysis 
provides  us  with  insight  into  the  reasons  underlying  the  fact  that,  in  our  examples, 
the  bounds  computed  with  and  without  combining  adjacent  sets  are  identical. 
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7.5  Error  estimation 

In  this  section,  we  estimate  the  error  introduced  by  the  use  of  the  heuristic  rules 
introduced  in  section  7.2.1.  We  then  analyze  some  cdternate  order  of  magnitude  rules, 
that  seem  intuitively  plausible,  and  show  that  these  rules  introduce  unacceptably  large 
errors.  The  analysis  is  done  using  probability  theory  cind  is  based  on  interpreting  each 
parameter  as  a  random  variable 7  The  zinalysis  also  uses  two  assumptions,  and  we 
conclude  with  a  discussion  of  the  validity  of  these  assumptions. 

7.5.1  Estimating  the  error  of  heuristic  rules 

In  this  section  we  analyze  the  error  introduced  by  the  heuristic  order  of  magnitude 
rules  3b,  and  4b.  (Rules  3c  and  4c  are  similar  to  rules  3b  and  4b,  respectively,  and 
are  not  discussed.)  The  remaining  rules  do  not  introduce  errors  in  the  sense  that  the 
bounds  predicted  by  them  are  guaranteed  to  be  conservative,  i.e.,  correct  though  not 
necessarily  tight. 

We  start  by  analyzing  rule  3b.  Let  Q,Qi,  aind  Q2  be  parameters  such  that 
Q  =  Qi  +  ^2-  Let  /<3,  and  /g,  be  the  probability  density  functions  of  Qi  and 
Q2,  respectively,  and  let  fQ^,Qj  be  their  joint  probability  density  function.  (Briefly, 
/<?i  (91)  the  probability  that  Qi  lies  between  qi  and  q\  +  dqi,  and  /gi,g,(9i,  92)  is  the 
probability  that  Qi  lies  between  qi  and  +  dqi,  eind  Q2  lies  between  92  and  92  +  dq2.) 
Since  (5  =  Qi  +  (^2,  it  follows  that  the  probability  that  Q  lies  between  /  and  u,  for 
any  values  /  and  u,  is: 

Prob{l  <  Q  <  u}  =  f  I  /Q„Qj(gi,g2)d92d9i  (7.29) 

Let  us  now  assume  that  om(Qi)  =  ni  and  om{Q2)  =  n2,  with  nj  >  n2.  Under  these 
conditions,  rule  3b  states  that  om(Q)  =  nj,  i.e.,  b^^  <  Q  <  To  estimate  the 

error,  c(Rule  3b),  in  rule  3b,  we  must  calculate  the  probability  that  Q  lies  outside  the 
region  from  6"*  to 

c(Rule  3b)  =  1  -  Pro6{6">  <Q<  6"*+^}  (7.30) 


^See  [Davenport,  1970]  for  an  introduction  to  probability  theory  and  random  variables. 
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too 

L  fQuQ2i9u92)dq2dqi  (7.31) 

J-oo  Jb’'l  -fli 

To  evaluate  this  integral,  we  make  the  following  assumptions: 

Assumption  1:  Qj  and  Q2  are  independent  random  variables.  Hence,  the  joint 
probability  density  of  Qi  and  Q2  is  just  the  product  of  the  individual  probability 
densities: 

/<3i,Q2(9i>92)  =  /<3i(9i)/«?2(92)  (7.32) 

Assumption  2:  Qi  and  Q2  are  uniformly  distributed  on  the  intervals  and 

[6"2,fe"2+^),  respectively: 


1  0  otherwise 


/oJft)  = 


it  <  ft  <  !>">+■ 
0  otherwise 


Hence,  from  Equations  7.31  and  7.32,  we  get: 


c(Rule  3b)  =  1  -  /  /  /Q,(gi)/(3,(g2)<ig2<f9i  (7.33) 

J—ao 

We  now  use  Assumption  2  to  split  the  above  integral  into  two  integrals,  such  that  the 
integrand  in  both  integrals  is  a  non-zero  constant  throughout  the  region  of  integration: 


e(Rule3b)  =  l-l_ 

yfc"i  +  >-fc"2+i  Jb”2  6"J+"2(6  — 

eJ)"l+l_J)"2  +  l  5"2+l  _  ^"2 


(7.34) 


An,  6’*l+"2(fe_  1)2 

|•fc"l+^-fc"2  jni+l  _  ^ 

A"i+i_k"2+i  6"i+'»*(6— 1)2  (7.35) 

26n,-nj(5_  1)  (7-36) 

Hence,  under  Assumptions  1  and  2,  the  error  in  rule  3b  is  maximum  when  (ni  —  n2) 
is  minimum,  i.e.,  (ni  —712)  =  1,  which  occurs  when  parameters  of  consecutive  orders 
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of  magnitude  are  being  added.  When  b  =  10,  the  maximum  error  is  6.11%.  For 
larger  values  of  (ni  —  712),  the  errors  are  even  smaller.  For  example,  with  b  =  10,  and 
(ni  —  n2)  =  2  (i.e.,  adding  a  parameter  that  is  two  orders  of  magnitude  smallei)  the 
estimated  error  is  only  0.61%. 

The  error  in  rule  4b  can  also  be  shown  to  be  (i+  l)/26"*“"*(6  —  1)  in  a  similar 
way.  In  particular,  ii  Q  =  Qi  -  Q2,  om{Qi)  =  ni,  0772(^2)  =  ”2,  and  ni  >  712,  then 
rule  4b  predicts  that  om{Q)  =  tij.  Using  Assumptions  1  and  2,  c(Rule  4b)  can  be 
calculated  as  follows: 


c(Rule  4b)  =  1  -  Profe{fe">  <Q< 

foo 

=  1“/  /  fQuQ2{<li^<}2)dq2dqi 


/oo 

/  4.1  (91 1  <}2)dq2dqi 

■00 

/oo 

/  (91  )/<?*  (92)^92^91 

,b"t+6"2+‘  1 

^  Jb'^i+b’'2  jb”2  —  1)2 

,6"i+i  /.6"2+>  j 

Jb’'l  4.{l"2+»  Jbn2  fc”!  +"2  {b~  1 1^ 


^  Jb'=i+b”2  Jb”2  j;)'‘l+»‘2(5  _  1)2‘^^2‘^91 

,6"1+1  /.6"2+>  J 

yfr"!  +ji"2+»  Jb^  fc”*  ■*■"*  (6  —  1  )^  dq2dq\ 

^  /■6"i+fr"2+>  _5n,  _  jnj 

Jb’'i+b’^  —  1)2 


Jb”i+b”i  6">+’*2(6  —  1)2 

,6"i+>  fe"2+^  — 

y6"l+6"2+>  6"i+"2(6  —  1)2 
6+1 

2I,’>i-«2(5_  1) 


(7.37) 

(7.38) 

(7.39) 


(7.40) 


(7.41) 

(7.42) 


Hence,  under  Assumptions  1  and  2,  the  maximum  error  in  the  heuristic  rules  of 
section  7.2.1  is  6.11%. 


7.5.2  Alternate  order  of  magnitude  rules 

The  above  error  estimation  techniques  can  also  be  used  to  analyze  alternate  rules 
for  order  of  magnitude  reasoning.  In  particular,  we  analyze  the  three  inference  rules 
shown  in  Figure  7.5.  These  rules  were  our  first  attempt  at  modeling  an  engineers 
order  of  magnitude  reasoning.  Rule  1'  and  2'  were  meant  to  model  reasoning  like: 
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1'  om{qi  ♦  92)  =  om{qi)  +  0771(92) 

2'  0771(91/92)  =  0771(9,)  -  0777(92) 

3a'  0777(91  +  92)  =  0777(91)  if  0777(91)  =  0777(92) 

Figure  7.5:  Alternate  rules  for  order  of  magnitude  reasoning. 

“If  the  resistance,  is  about  10"^  ohms,  and  the  current,  7,  is  about  10”^  tunps, 
then  the  voltage  drop,  V  (=  iR)^  is  about  10"^  volts.”  The  idea  was  that  the  order 
of  magnitude  of  a  product  or  quotient  was  the  sum  or  difference,  respectively,  of  the 
orders  of  magnitudes  of  the  arguments.  Rule  3a'  was  meant  to  model  the  intuition 
that  adding  parameters  of  the  same  order  of  magnitude  results  in  a  parameter  of  the 
same  order  of  magnitude.  However,  we  now  show  that,  while  these  rules  may  appear 
intuitively  appealing,  they  are  also  unacceptably  error-prone. 

We  start  by  estimating  th ;  error  in  rule  1'.  Let  0777(^1)  =  77i,  om{Q2)  =  772,  and 
Q  =  *  Q2.  Rule  1'  predicts  that  om{Q)  =  77i  -f  772.  Using  Assumptions  1  and  2, 

€(Rule  1')  can  be  calculated  zis  follows: 


e(Rule  1')  =  1  -  Pro6{6">+"»  <Q< 


fc^l+nj+l 


U+n,  fQi,Q7{Qu q2)dq2dqi 

=  1  (91  )/Q7  {q2)dq2dqi 


+1 


= '-/  n 

Jb^i  Jb^2 


(6-  1) 


-^92^91 


,{,"1+1  ^ni+nj+l  j 


=  1  - 


61n6—  b+1 

(4  - 1)^ 


(7.43) 

(7.44) 

(7.45) 

(7.46) 

(7.47) 

(7.48) 


Substituting  b=  10  in  Equation  7.48,  the  error  in  rule  1',  under  Assumptions  1  and  2, 
is  82.68%. 

Next,  we  estimate  the  error  in  rule  2'.  Let  om{Qi)  =  77i,  0777(^2)  =  ”2?  a^id 
Q  =  QiIQ2-  Rule  2'  predicts  that  om{Q)  =  rii  —  772.  Using  Assumptions  1  and  2, 
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c(Rule  2')  can  be  calculated  as  follows: 


e(Rule  2') 


1  -  <Q  < 


/OO  ^  ■ 

I  *  /(?1.02  (91,92)^92^91 

^  “  J-  JllsxL,  (9i)/e2(92)<^92<f9i 

^  ^nj— 1*2+1 

/.6"l+l  /• — 22 —  1 

i-T"'  -■  1 

Jb' 


- fcn2)_ 

'6"l  ''fcni+n2^^_  2^2 


<f92<f9l 


(7.49) 

(7.50) 

(7.51) 

(7.52) 

(7.53) 

(7.54) 


Hence,  under  Assumptions  1  and  2,  the  error  in  rule  2'  is  50^ 

Finally,  we  estimate  the  error  in  rule  3a'.  Let  om{Qx)  -  om{Q-i)  =  n,  and 
Q  ~  Qi  +  Q2’  Rule  3a'  predicts  that  om[Q)  =  n.  Using  Assumptions  1  and  2, 
c(Rule  3a')  can  be  calculated  as  follows; 


c(Rule  3a') 


-  Pro6{6"  <  Q  <  6"+'} 

(7.55) 

yoo  yfc"+*-9i 

“  /  L  fQuQ3{9u92)dq2dqi 

(7.56) 

yoo  yfr"+*-gi 

“  /  L  /Ql(9l)/Q2(92)<f92<f9l 

J-00 

(7.57) 

y6"+»-6"  yfr^+'-gi  J 

Jb-  Jb-  62"(6-l)2‘^^^‘^^^ 

(7.58) 

^6n+i_6«  ^n+1  -91-6" 

Jb-  62"(6-1)2 

(7.59) 

(6-2)2 

2(6-  1)2 

(7.60) 

Substituting  6—10  into  Equation  7.60,  the  error  in  rule  3a',  under  Assumptions  1 
and  2,  is  60.49%. 

Hence,  we  have  shown  that,  under  Assumptions  1  and  2,  the  error  introduced  by 
the  alternate  rules  shown  in  Figure  7.5  are  greater  than  or  equal  to  50%.  We  believe 
that  these  errors  are  unacceptably  large,  and  hence  have  chosen  not  to  include  these 
rules  in  NAPIER. 
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7.5.3  Discussion 

The  error  estimation  results  presented  above  depend  crucially  on  Assumptions  1 
and  2.  We  now  discuss  the  validity  and  scope  of  these  assumptions. 

Assumption  1  assumes  that  the  two  parameters  being  combined,  Qi  and  Q2, 
are  independent  random  variables.  This  assumption  is  reasonable  if  Q]  and  Q2  are 
exogenous  parameters.  It  is  also  reasonable  if  the  set  of  exogenous  parameters  used  to 
calculate  the  order  of  magnitude  of  Qi  is  disjoint  from  the  set  of  exogenous  parameters 
used  to  calculate  the  order  of  magnitude  of  Q2.  However,  if  the  orders  of  magnitudes 
of  Qi  and  Q2  depend  on  the  order  of  magnitude  of  a  common  exogenous  parameter, 
then  Qi  and  Q2  are  not  independent. 

Assumption  2  assumes  that  the  two  parameters  being  combined,  Qi  and  Q2,  are 
uniformly  distributed  random  variables.  In  the  absence  of  <iny  additional  information, 
this  assumption  is  reasonable  for  exogenous  parameters.  However,  it  breaks  down  for 
derived  parameters.  For  example,  if  and  Q2  are  uniformly  distributed  random 
variables,  and  if  Q  =  Q\  op  Q2  (where  op  is  one  of  or/),  then  Q  is  not 

uniformly  distributed.  Hence,  when  the  order  of  magnitude  of  Q  is  used  to  calculate 
the  orders  of  magnitudes  of  other  parameters.  Assumption  2  is  not  valid. 

The  above  discussion  implies  that  our  error  estimation  technique  has  limited  ap¬ 
plicability.  In  particular,  the  errors  estimated  in  this  section  cannot  be  directly  used 
to  estimate  the  error  introduced  in  predictions  based  on  a  set  of  equations.  Nonethe¬ 
less,  these  techniques  have  proved  useful  in  helping  us  select  a  reasonable  set  of  order 
of  magnitude  reasoning  rules,  while  alerting  us  to  the  possibility  of  large  errors  intro¬ 
duced  by  alternate  rules. 


7.6  Related  work 

Order  of  magnitude  reasoning  has  been  widely  studied  in  AI.  Murthy  [Murthy,  1988] 
was  the  first  to  propose  the  use  of  a  logarithmic  scale  for  the  order  of  magnitude  of 
a  parameter.  In  that  paper,  he  also  provides  rules  of  inference  t(  infer  new  orders 
of  magnitude  from  old  ones.  Some  of  these  rules  are  similar  to  ours.  For  example, 
he  includes  rules  3b,  3c,  4b,  and  4c.  However,  instead  of  1,  he  proposes  the  rule 
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om(^i  ♦  92)  =  om(qi)  +  om{q-i)  (which  is  rule  1'),  and  instead  of  rulft  3a,  he  proposes 
the  rule  om{q\  +  92)  =  when  om{q\)  =  0x0(92)  (which  is  rule  3a').  As  we  ;3.a  v 

in  section  7.5.2,  the  estimated  error  in  these  rules  is  too  large,  and  hence  we  h^iv;> 
chosen  not  to  include  them  in  NAPIER.  Unlike  our  work,  Murthy  provides  no  analysis 
of  how  his  inference  rules  can  be  used  to  find  the  order  of  magnitudes  of  parameters 
related  by  sets  of  simultaneous  equations.  In  addition,  we  also  analyze  the  complexity 
of  order  of  magnitude  inference,  and  present  an  approximate  reasoning  technique  that 
works  well  in  practice. 

Raiman  [Raiman,  1991;  Raiman,  1986]  explores  the  foundations  of  symbolic  order 
of  magnitude  reasoning.  He  defines  a  variety  of  order  of  magnitude  scales,  such  as 
Close  and  Comparable,  built  out  of  the  basic  order  of  magnitude  granularities.  Small 
and  Rough.  He  introduces  ESTIMATES,  a  system  to  solve  order  of  magnitude  equa¬ 
tions.  The  primary  difference  between  NAPIER  and  ESTIMATES  is  one  of  emphasis; 
NAPIER  can  be  viewed  as  providing  justifications  for  making  order  of  magnitude  as¬ 
sumptions;  ESTIMATES  can  be  viewed  as  a  formalization  of  the  use  of  such  order  of 
magnitude  assumptions  to  symbolically  manipulate  and  simplify  equations. 

Order  of  magnitude  reasoning  in  the  0(M)  formalism  [Mavrovouniotis  and  Ste- 
phanopolous,  UjoT]  uses  a  parameter  e  to  represent  the  largest  parameter  that  can 
be  considered  to  be  “much  smaller”  than  1.  This  is  analogous  to  the  pwameter  h  in 
NAPIER  (i.e.,  h  —  1/e).  However,  there  are  a  number  of  differences  between  0(M)  and 
NAPIER.  First,  the  0(M)  formalism  is  based  on  order  of  magnitude  relations  between 
parameters.  Hence,  it  works  best  when  equations  involve  only  links  (links  are  ratios  of 
parameters).  NAPIER,  on  the  other  hand,  is  based  on  the  order  of  magnitudes  of  the 
parameters  themselves,  and  hence  works  with  any  algebraic  equations.  This  is  advan¬ 
tageous  because  it  is  not  always  possible  to  convert  equations  into  equations  involving 
only  li..ks.  Second,  0(M)  requires  equations  to  be  converted  into  assignments,  which 
allow  a  new  relation  or  range  to  be  inferred  from  already  known  relations.  This  is  a 
serious  restriction  since  equations  can  be  converted  to  assignments  only  in  the  absence 
of  simultaneous  equations.  As  we  have  see.*,  NAPIER  does  not  have  this  restriction. 

NAPIER  is  also  related  to  interval  reasoning  discussed  in  [Moore,  1979;  Simmons, 
1986;  Sacks,  1987].  NAPIER  can  be  viewed  as  interval  reasoning  in  which  the  end 
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points  of  the  iniervaJ  are  restricted  to  a  particular  set  of  points  of  the  form  6",  with 
specified  base  b,  and  any  integer  n.  The  drawback  of  this  restriction  is  that  under 
certain  conditions,  compared  to  interval  reasoning,  the  bounds  inferred  by  NAPIER  are 
unnecessarily  loose  (e.g.,  see  the  discussion  of  rule  3a  in  section  7.2.1).  The  advantage 
of  this  restriction  is  that,  unlike  traditional  interval  reasoners,  NAPIER  is  able  to  use 
sets  of  non-linear  simultaneous  equations  to  infer  parameter  bounds.  In  addition,  the 
ability  to  simultaneously  process  all  the  equations  in  a  set  allows  NAPIER  to  exploit 
global  constraints  to  compute  tighter  bounds  (see  section  7.4).  Another  distinguishing 
characteristic  of  NAPIER,  which  classifies  it  as  an  order  of  magnitude  reasoning  system 
rather  than  just  an  interval  reasoner,  is  the  use  of  heuristic  rules  (e.g,  rule  3b). 


7.7  Summary 

In  this  chapter  we  described  an  implemented  order  of  magnitude  reasoning  system 
called  NAPIER.  NAPIER  defines  the  order  of  magnitude  of  a  parameter  on  a  logarithmic 
scale  and  uses  a  set  of  rules  to  propagate  order  of  magnitudes  through  equations.  A 
novel  feature  of  NAPIER  is  its  handling  of  non-linear  simultaneous  equations.  Since  the 
order  of  magnitude  reasoning  rules  are  all  disjunctions  of  linear  inequalities,  NAPIER 
is  able  to  use  linear  programming,  in  conjunction  with  backtracking,  to  find  bounds 
on  the  order  of  magnitudes  of  parameters  related  by  sets  of  non-lineeir  simultaneous 
equations. 

We  also  showed  that  order  of  magnitude  reasoning  using  NAPIER’s  rules  is  in¬ 
tractable.  Hence,  NAPIER  uses  an  approximate  reasoning  technique,  based  on  causal 
ordering,  leading  to  a  practically  useful  system.  This  approximate  reasoning  tech¬ 
nique  trades  off  accuracy  for  speed,  though  in  practice  there  does  not  appear  to  be 
any  loss  of  accuracy. 

Some  of  NAPIER’s  rules  are  heuristic  rules,  and  we  have  estimated  the  error  in¬ 
troduced  by  the  use  of  these  rules.  We  have  also  shown  that  intuitively  appealing 
alternate  heuristic  rules  lead  to  large  estimated  errors. 


Chapter  8 

Model  selection  program  and 
results 


In  this  chapter  we  describe  an  implemented  model  selection  program  based  on  the 
algorithms  developed  in  the  previous  two  chapters.  The  program  assumes  that  the 
knowledge  base  of  component  and  model  fragment  classes  satisfies  all  the  restrictions 
introduced  in  Chapters  5  and  6.  However,  the  actual  knowledge  base  that  we  have 
constructed  does  not  satisfy  two  of  the  restrictions:  (a)  the  knowledge  base  does  not 
include  all  the  ownership  constraints  (justified  in  Section  5.4.5);  and  (b)  p2u:ameters 
are  not  required  to  be  locally  self- regulating  (justified  in  Section  6.6.3). 

In  addition  to  the  knowledge  base  of  component  2ind  model  fragment  classes,  the 
program  has  the  following  inputs: 

1.  The  structure  of  the  device,  which  includes  a  description  of  the  components  of 
the  device,  the  physical  and  structural  properties  of  these  components,  and  the 
structural  relations  between  these  components.  As  discussed  in  Section  3.6.1, 
this  is  the  structural  context  of  the  device. 

2.  The  expected  behavior  of  the  device. 

3.  Orders  of  magnitudes  of  initial  values  and  exogenous  values  of  parameters,  which 
are  used  in  generating  the  behavioral  context  of  the  device.  Initial  values  axe 
used  when  a  parameter  is  determined  by  integration,  while  exogenous  values 
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are  used  when  a  parameter  is  assumed  to  be  exogenous.  Section  8.1.2  discusses 
this  in  detciil. 

4.  Orders  of  magnitudes  of  the  thresholds  used  in  behavioral  constraints.  This 
allows  us  to  check  the  behavioral  constraints. 

The  program  produces  an  adequate  model  by  first  finding  an  initial  causal  model 
and  then  simplifying  this  causal  model  using  the  variant  of  find-minimal- causal-mo¬ 
del  discussed  in  Section  5.7.  Section  8.1  discusses  a  heuristic  method  for  finding  an 
initial  causal  model  that  is  simpler  than  the  most  accurate  model,  and  demonstrates 
this  method  on  the  temperature  gauge  shown  in  Figure  1.1.  Section  8.2  illustrates 
the  simplification  procedure  of  Section  5.7  on  the  initial  causal  model  of  the  above 
temperature  gauge.  Finally,  Section  8.3  describes  the  results  of  running  '.his  model 
selection  program  on  a  variety  of  electromechanical  devices. 


8.1  Finding  an  initial  causal  model 

The  model  selection  algorithm  developed  in  Section  5.1.2  was  based  on  the  fact  that 
a  causal  model  exists  if  and  only  if  the  most  accurate  model  of  the  device  is  a  causal 
model.  Hence,  a  minimal  causal  model  can  be  found  by  simplifying  the  most  accurate 
model  of  the  device.  However,  this  is  often  undesirable  be;ause  the  most  accurate 
device  model  can  be  .  necessarily  complex,  so  that  simplifying  it  can  take  a  long 
time.  In  this  section  we  introduce  a  heuristic  technique  for  finding  an  initial  causal 
model.  This  initial  causal  model  is  a  subset  of,  and  hence  simpler  than,  the  most 
accurate  model.  The  heuristic  technique  is  applicable  only  if  the  knowledge  base 
satisfies  the  following  restriction: 

•  If  a  model  satisfies  all  the  structural  and  behavioral  coherence  constraints,  then 
any  simpler  consistent  and  complete  model  that  uses  model  fragments  from  the 
same  assumption  classes  also  satisfies  all  the  structural  and  behavioral  coherence 
constraints. 

This  restrictions  ensures  that  any  causal  model,  not  just  the  most  accurate  model, 
can  be  simplified  using  the  techniques  developed  in  Chapter  5.  We  assume  that  our 
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knowledge  base  satisfies  the  above  restriction. 

The  heuristic  technique  for  finding  an  initial  causal  model  is  based  on  the  compo¬ 
nent  interaction  heuristic.  Section  8.1.1  introduces  this  heuristic,  and  Section  8.1.2 
describes  the  heuristic  technique. 

8.1.1  Component  interaction  heuristic 

A  device  model  can  be  viewed  as  having  two  major  parts:  (a)  models  of  individ¬ 
ual  components;  and  (b)  models  of  interactions  between  components.  Components 
can  interact  with  each  other  only  when  appropriate  structural  relations  hold  between 
them.  For  example,  in  our  knowledge  base,  the  connected-to  relation  between  ter¬ 
minals  supports  electrical  interactions  between  the  connected  terminals.  Hence,  two 
components  can  electrically  interact  with  each  other  if  a  terminal  of  one  compo¬ 
nent  is  connected-to  a  terminal  of  the  other  component.  As  another  example,  the 
coilsd-around  relation  between  wires  and  physical  objects  supports  thermal  inter¬ 
actions  between  the  wire  and  the  physical  object  that  it  is  coiled-around. 

In  addition  to  requiring  appropriate  structural  relations,  components  can  interact 
only  when  the  component  models  are  compatible  with  the  type  of  interaction  under 
consideration.  For  example,  a  wire  can  electrically  interact  with  a  battery  if  one  of 
the  wire’s  terminals  is  connected-to  one  of  the  battery’s  terminals.  However,  this 
interaction  can  take  place  only  if  both  the  wire  and  the  battery  eire  being  modeled 
as  electrical  components,  e.g.,  modeling  the  wire  as  an  electrical  conductor,  and  the 
battery  as  a  voltage  source  is  compatible  with  the  electrical  interaction. 

Hence,  components  can  interact  with  each  other  if  the  following  conditions  are 
satisfied:  (a)  the  components  are  related  by  the  structural  relations  that  support 
the  interaction;  and  (b)  the  component  models  are  compatible  with  the  interaction. 
The  component  interaction  heuristic  is  based  directly  on  the  above  observations.  It 
states  that  if  a  set  of  components  are  related  by  one  or  more  structural  relations 
that  support  an  interaction,  and  if  one  of  the  component  models  is  compatible  with 
this  interaction,  then  the  remaining  component  models  must  be  augmented  to  be 
compatible  with  this  interaction.  This  allows  the  components  in  the  set  to  interact 
with  each  other  via  that  interaction.  Note  that  if  none  of  the  component  models  is 
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compatible  with  the  interaction,  then  no  augmentations  are  necessary. 

The  component  interaction  heuristic  is  implemented  as  a  set  of  heuristic  coherence 
constraints.  Each  such  constraint  is  a  version  of  the  component  interaction  heuristic 
that  is  specialized  for  a  particular  type  of  interaction  and  a  particular  set  of  struc¬ 
tural  relations.  Heuristic  coherence  constraints  are  expressed  as  horn  rules,  and  like 
structural  and  behavioral  constraints,  are  associated  with  model  fragment  classes. 
For  example,  the  following  heuristic  coherence  constraint 

(implies 

(and  (terminals  ?object  ?terml) 

(voltage-terminal  ?terml) 

(connected-to  ?terml  ?term2) 

(tenninal-of  ?term2  ?comp2)) 
(electrical-component  ?comp2)) 

in  the  electrical-component  model  fragment  class^  says  that  if  a  component  is 
being  modeled  ais  an  electrical-component,  and  one  of  the  component’s  voltage 
terminals  is  connected  to  a  terminal  of  another  component,  then  the  other  component 
must  also  be  modeled  as  an  electricad-component.  This  allows  the  two  components 
to  interact  by  sharing  voltages  at  the  connected  terminals. 

As  another  example,  a  heuristic  coherence  constraint  associated  with  the  ther¬ 
mal-object  model  fragment  class  is  the  following: 

(implies 

(zind  (wire  ?object) 

(coiled-around  ?object  ?core)) 

(thermal-object  ?core)) 

which  implements  the  component  interaction  heuristic  for  the  thermal  interaction 
between  a  wire  and  an  object  around  which  it  is  coiled. 

We  will  require  that  the  initial  causal  model  must  satisfy  all  applicable  heuristic 
coherence  constraints.  We  now  show  how  these  constraints  are  used  to  find  2in  initial 
causal  model. 

'Hence,  “?object”  is  bound  to  a  component  being  modeled  as  an  electrical-component. 
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Figure  8.1:  Algorithm  for  finding  an  initial  causal  model 
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8.1.2  Finding  an  initial  causal  model 

Figure  8.1  shows  a  flowchart  describing  our  algorithm  for  finding  an  initial  causal 
model.  The  input  to  the  algorithm  is  as  described  earlier:  (a)  the  structure  of  the 
device;  (b)  the  expected  behavior  of  the  device;  (c)  orders  of  magnitudes  of  initiaJ  val¬ 
ues  and  exogenous  values  of  parameters;  and  (d)  orders  of  magnitudes  of  thresholds. 
In  addition,  each  component  can  have  a  set  of  model  fragment  classes  preselected  for 
it — these  correspond  to  modeling  decisions  made  by  the  user. 

There  are  five  major  steps  in  this  algorithm,  numbered  1-5  in  the  rectangular 
boxes  of  Figure  8.1.  Each  step,  except  step  3  entitled  “Generate  behavior,”  can 
modify  the  device  model  by  adding  one  or  more  model  fragments  to  it.  Whenever 
a  step  adds  a  model  fragment  to  the  device  model,  it  also  adds  the  most  accurate 
model  fragment  from  every  assumption  class  required  by  the  model  fragment.  Hence, 
at  the  end  of  every  step,  all  the  requires  constraints  are  satisfied. 

We  now  describe  the  details  of  the  five  steps,  and  the  flow  of  control  between 
them.  We  will  illustrate  the  algorithm  on  the  temperature  gauge  in  Figure  1.1. 
Figure  8.2  shows  some,  though  not  all,  of  the  components  in  the  temperature  gauge, 
together  with  their  component  classes.  Note,  in  particular,  the  l^lst  four  components, 
which  correspond  to  structural  abstractions  that  are  automatically  created  from  the 
device  description  (see  Section  2.4.2).  bms-wire  is  an  abstraction  representing  wire-1 
coiled-around  bms-3,  while  the  remaining  three  abstractions  represent  pointer-2, 
bms-3,  and  battery-5,  respectively,  immersed-in  atin-6. 


Component 

Component  classes 

thermistor-l 

Thermistor 

pointer-2 

Pointer 

bins-3 

Bimetallic-strip 

sire-4 

Wire 

battery-6 

Battery 

atD-6 

Atmosphere 

bns-vire 

Coil-structure 

atn-pointer 

Immers ion-structure 

atn-bns 

Immersion-structure 

atm-battery 

Immersion-structure 

Figure  8.2:  Components  and  their  initial  models. 
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Include  expected  behavior  parameters 

In  the  first  step,  the  device  model  is  augmented  to  ensure  that  it  contains  all  the 
parameters  in  the  expected  behavior.  This  is  a  good  place  to  start  because  every 
causal  model  must  contain  every  parameter  in  the  expected  behavior. 

Recall  that  a  parameter  represents  a  numerical  attribute  of  some  component.  The 
relationship  between  a  component  and  a  parameter  representing  one  of  its  numerical 
attributes  is  mediated  by  parameter  functions:  applying  the  parameter  function  to 
the  component  returns  the  parameter.  In  Chapter  2  we  saw  that  the  the  attributes 
clause  in  the  definition  of  a  model  fragment  class  defines  the  parameter  functions  that 
can  be  used  on  instances  of  that  class.  Hence,  a  device  model  contains  a  parameter 
if  and  only  if  the  component  corresponding  to  the  partimeter  is  an  instance  of  the 
model  fragment  class  that  defines  the  corresponding  parameter  function. 

More  precisely,  if  p  is  a  parameter,  let  Cp  denote  the  component  of  which  p  is  a 
numerical  attribute,  let  fp  denote  the  parameter  function  such  that  fp{cp)  =  p,  and 
let  Mp  be  the  model  fragment  class  that  defines  fp.  A  device  model  contains  the 
parameter  p  if  and  only  if  Cp  is  modeled  as  an  instance  of  Mp. 

If  the  expected  behavior  contains  a  parameter  p  such  that  the  device  model  does 
not  contain  p,  then  it  means  that  Cp  is  not  being  modeled  as  an  instcince  of  Mp.  This 
situation  can  be  rectified  by  modeling  Cp  as  an  instcince  of  some  model  fragment  class 
M  such  that: 

1.  M  is  a  specialization  of  Mp\ 

2.  M  is  in  the  transitive  closure  of  the  possible-models  of  the  component  class 
of  Cp;  and 

3.  M{cp)  is  the  most  accurate  applicable  model  fragment  in  its  assumption  class. 

The  first  condition  ensures  that  Cp  is  an  instance  of  Mp,  so  that  p  becomes  part  of  the 
device  model.  The  second  condition  ensures  that  M  is  a  possible  way  of  modeling  Cp. 
The  third  condition  ensures  that  we  only  consider  model  fragments  that  are  parts  of 
the  most  accurate  model  of  the  device,  so  that  the  initial  causal  model  that  we  create 
will  be  a  subset  of  the  most  accurate  device  model. 
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Note  that  M{cp)  must  be  an  applicable  model  fragment  so  that  all  the  structural 
and  behavioral  preconditions  of  M  must  be  satisfied  with  ?object  bound  to  Cp.  How¬ 
ever,  since  the  device’s  behavior  has  not  yet  been  generated,  behavioral  preconditions 
cannot  be  evaluated.  Hence,  any  model  fragment  classes  with  behavioral  precon¬ 
ditions  axe  inapplicable.  This  is  not  a  serious  problem  because  almost  none  of  the 
most  accurate  model  fragment  classes  in  our  library  have  behavioral  preconditions — 
behavioral  preconditions  primarily  control  the  use  of  approximations.  In  addition,  we 
assume  that  if  a  most  accurate  model  fragment  class  has  behavioral  preconditions, 
then  all  the  other  model  fragment  classes  in  that  assumption  class  also  have  behav¬ 
ioral  preconditions,  so  that  none  of  them  are  applicable  at  this  stage.  This  ensures 
that  we  only  consider  most  accurate  model  fragment  classes  at  this  stage.  Most  ac¬ 
curate  model  fragment  classes  with  behavioral  preconditions  will  be  considered  in  the 
fourth  step,  after  the  behavior  is  generated. 

In  general,  for  any  parameter  p  and  component  Cp,  there  can  be  more  than  one 
model  fragment  class  that  satisfies  the  above  three  conditions.  Let  A/j  and  M2  be 
model  fragment  classes  that  satisfy  the  above  three  conditions,  and  let  M2  be  a 
specialization  of  Mj.  Hence,  modeling  Cp  cis  an  instance  of  M2  will  also  model  it  as 
an  instance  of  Mj.  However,  modeling  Cp  cis  an  instance  of  Mi  will  not  model  it  as 
an  instance  of  M2.  Hence,  to  keep  the  initial  causal  model  as  simple  as  possible,  we 
make  Cp  an  instance  of  the  most  general  model  fragment  class  that  satisfies  the  above 
three  conditions.  The  use  of  the  other  model  fragment  cla.sses  that  satisfy  the  above 
three  conditions  will  be  discussed  later. 

Let  us  illustrate  this  step  on  the  temperature  gauge  in  Figure  1.1.  Let  us  assume 
that  the  expected  behavior  of  this  temperature  gauge  is: 

(causes  (temperature  thermistor-1) 

(emgular-position  pointer-2)) 

This  means  that  the  parameters  representing  the  temperature  of  thermistor- 1 
and  the  angular  position  of  pointer-2  must  be  part  of  the  device  model.  The  tem¬ 
perature  parameter  function  is  defined  in  the  Temperature-model  model  fragment 
class.  A  search  of  the  possible-models  of  Thermistor  reveals  that  Thermal-object 
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and  Thermal-thermistor  are  the  most  accurate  model  fragment  classes  that  are 
specializations  of  Temperature-model.  However,  Thermal -object  is  a  generalization 
of  Thermal-thermistor.  Hence,  we  model  thermistor-1  as  a  Thermed-object,  i.e., 
the  device  model  is  augmented  with  the  model  fragment  Thermal-object  (thermis- 
tor-1).  Similarly,  pointer-2  is  modeled  as  an  instance  of  Rotating-object. 

In  addition,  as  discussed  earlier,  the  first  step  is  not  complete  until  all  the  re¬ 
quires  constraints  are  satisfied.  In  particular,  if  the  device  model  contains  a  model 
fragment  M(c),  where  M  is  a  model  fragment  class  and  c  is  a  component,  and  if  the 
required-assumption-classes  of  M  specifies  assumption  class  A,  then  the  device 
model  must  contain  a  model  fragment  from  assumption  class  A{c).  Once  again,  to 
ensure  that  the  resulting  model  is  a  subset  of  the  most  accurate  device  model,  we 
augment  the  device  model  with  the  most  accurate  applicable  model  fragment  in  /1(c), 
i.e.,  if  Ma  is  the  most  accurate  model  fragment  class  of  A,  we  augment  the  device 
model  with  model  fragment  Ma{c).  {Ma  is  assumed  to  be  a  possible-model  of  the 
component  class  of  c.) 

In  the  current  e  cample.  Thermal-object  specifies  Thermal-model-class  as  one 
of  its  required-assumption-classes.  The  most  accurate  model  Lagment  class  of 
Thermal-model-class  is  Dynamic-thermal -model  (see  Equation  6.2).  Hence,  to 
satisfy  this  requires  constraint,  thermistor-1  is  made  an  instance  of  Dyneimic-ther- 
mal-model.  The  resulting  device  model  is  shown  in  Figure  8.3. 


Component 

Model 

thermistor-l 

Themed-object 

Dynamic-themal-model 

pointer- 2 

Rotating-object 

bnis-3 

Bire-4 

battery-5 

atm-6 

bms-Bire 

atm-po inter 

atm-bms 

atm-battery 

Figure  8.3:  Component  models  after  the  expected  behavior  parameters  have  been 
included. 
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Enforcing  heuristic  and  structural  coherence  constraints 

Once  the  device  model  contains  ail  the  parameters  in  the  expected  behavior,  the 
algorithm  checks  all  applicable  structural  and  hc-  ristic  coherence  constraints.  If 
ail  these  constraints  are  satisfied,  the  algorithm  merely  proceeds  to  the  next  step. 
However,  if  a  constraint  is  not  satisfied,  it  augments  the  device  model  as  described 
below. 

Recall  that  a  structural  coherence  constraint  is  like  a  horn  rule,  except  that  the 
consequent  of  the  rule  is  a  disjunction  of  all  the  model  fragments  in  an  assumption 
class.  Hence,  if  a  structural  coherence  constraint  is  not  satisfied,  it  means  that  the 
current  device  model  does  not  include  a  model  fragment  from  that  assumption  class. 
This  situation  is  exactly  fuialogous  to  the  case  where  one  of  the  requires  constraints 
is  not  satisfied.  Hence,  it  is  rectified  in  the  same  way:  the  device  model  is  augmented 
with  the  most  accurate  applicable  model  fragment  in  the  assumption  class.  Once 
again,  by  choosing  the  most  accurate  model  fragment,  we  ensure  that  the  resulting 
device  model  continues  to  be  a  subset  of  the  most  accurate  device  model. 

Heuristic  coherence  constraints,  on  the  other  hand,  axe  just  horn  rules,  i.e.,  the 
consequent  of  the  rule  is  of  the  form  M{c),  where  M  is  a  model  fragment  class  and  c 
is  a  component.  Hence,  if  a  heuristic  coherence  constraint  is  not  satisfied,  it  means 
that  the  current  device  model  does  not  include  the  model  fragment  in  the  consequent 
of  the  rule,  i.e.,  c  is  not  an  instance  of  M.  This  situation  is  exactly  analogous  to 
the  case,  discussed  earlier,  where  Cp  had  to  be  an  instance  of  Mp  to  ensure  that 
the  device  model  contained  parameter  p.  Hence,  it  is  rectified  in  exactly  the  same 
way,  i.e.,  by  making  c  an  instance  of  the  most  general  specialization  of  M  that  is  a 
possible-model  of  the  component  class  of  c. 

For  exaimple,  pointer-2  was  modeled  as  a  Rotating-object  in  the  first  step.  Be¬ 
cause  of  the  linkage  connecting  pointer-2  to  the  free  end  of  bimetallic  strip  bms-3,  a 
heuristic  coherence  constraint  requires  a  kinematic  interaction  between  pointer-2 
and  bnis-3.  This  constraint  can  be  satisfied  by  modeling  bms-3  as  an  instance 
of  Thermal-bimetcillic-strip,  which  models  the  deflection  of  the  free  end  of  the 
bimetallic  strip  as  a  function  of  its  temperature. 

As  in  the  first  step,  all  requires  constraints  are  also  enforced.  In  particular. 
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Thermal -bimetallic-strip  is  a  specialization  of  Thermal-object,  which  requires 
the  use  of  a  model  fragment  class  in  the  Thermal -model-class  assumption  class. 
As  before,  the  most  accurate  model  fragment  is  used,  so  that  the  device  model  is 
augmented  with  D3rnamic-thermal-model(bms-3). 

If  there  are  any  changes  to  the  device  model  in  this  step,  the  step  is  repeated 
to  check  whether  additional  structural  or  heuristic  coherence  constraints  have  been 
violated.  This  repetition  continues  until  all  such  constraints  are  satisfied,  at  which 
point  the  algorithm  proceeds  to  the  next  step. 

For  example,  we  just  modeled  bms-3  as  an  instance  of  Thermal -object.  Since 
bms-3  is  immersed-in  atm-6,  a  thermal  interaction  is  possible  between  them.  Hence, 
a  heuristic  coherence  constraint  requires  that  atm-6  should  be  modeled  as  a  Ther¬ 
mal-object  and  atm-bms  should  be  modeled  as  a  Thermal- conduct  or.  Similarly, 
since  wire-4  is  coiled-around bms-3,  a  thermal  interaction  is  possible  between  them, 
and  hence  a  heuristic  coherence  constraint  requires  that  wire-4  should  be  modeled 
as  a  Thermal-object  and  wire-bms  as  a  Thermal-conductor.  Resistive-ther¬ 
mal-conductor  is  the  most  general  specialization  of  Thermal-conductor  that  is 
also  a  possible-model  of  both  Immersion-structure  and  Coil-structure.  Hence, 
atm-bms  and  bms-wire  are  both  made  instances  of  Resistive-thermal-conductor. 

Modeling  the  atmosphere  as  a  ThermeJ.- object  means  that  a  thermal  interac¬ 
tion  is  possible  with  all  components  immersed-in  it.  Hence,  a  heuristic  coherence 
constraint  rtquires  that  both  pointer-2  ard  battery-5,  which  aie  immersed-in 
the  atmosphere,  must  be  modeled  2is  Thermal-objects,  and  the  corresponding  Im- 
mersion-structiires  are  modeled  as  Thermal-conductors.  Finally,  to  satisfy  all 
the  requires  constraints,  all  the  Thermal-objects  aire  also  modeled  as  instances  of 
Dynaunic-thermal -model.  Once  this  is  done,  all  the  structural  and  heuristic  coher¬ 
ence  constraints  are  satisfied,  and  the  algorithm  continues  to  the  next  step.  The 
resulting  model  is  shown  in  Figure  8.4. 

Generating  the  behavior 

In  the  third  step,  the  algorithm  uses  the  device  model  constructed  in  the  previous 
steps  to  generate  the  behavior.  This  involves  calculating  the  orders  of  magnitudes 


226 


CHAPTER  8.  MODEL  SELECTION  PROGRAM  AND  RESULTS 


Component 

Model 

thaniistor-l 

Thexsal-ob j act 
Dynaaic-tharaal-aodal 

pointer-2 

hotating-obj  act 

Thaxmal-ob j  act 
Dynaaic-tharaal -model 

bns-S 

Tharval-biaetallic-strip 

Dynaaic-tbaraal-aodal 

Bire-4 

Tharaal-objact 

DynaBic-tharaal-Bodttl 

battery-S 

Tharaal-objact 

Dynaai c-thazaal -modal 

atii*6 

Tharaal-objact 

Dynaaic-tharmal-modal 

bms-vire 

Rasistive-thermal-conductor 

atm-pointer 

Rasistiva-tharmal-conductor 

atm-bms 

Rasistive-thermal-conductor 

ata-battery 

Rasistiva-tharmal-conductor 

Figure  8.4:  Component  models  after  the  heuristic  coherence  constraints  have  been 
repeatedly  satisfied. 


of  all  the  parameters,  using  the  techniques  developed  in  Chapiter  7.  The  order  of 
magnitude  of  parameters  assumed  to  be  exogenous  in  the  device  model  are  found  in 
the  input  to  the  algorithm.  Hence,  the  input  must  specify  an  exc,  enous  value  for 
every  parameter  that  can  be  assumed  to  be  exogenous  in  some  device  model. 

Recall  that  the  techniques  of  Chapter  7  were  applicable  only  to  sets  of  ^dgebraic 
equations.  However,  the  device  model  constructed  above  contains  differential  equa¬ 
tions.  We  address  this  mismatch  by  only  computing  the  behavior  at  a  particular 
point  in  time.  The  particular  point  in  time  is  defined  by  the  values  of  the  parameters 
being  integrated.  One  can  think  of  these  values  as  being  analogous  to  initial  values 
specified  for  numerical  integration.  These  initial  values  are  specified  in  the  input  to 
the  algorithm,  and  hence  the  input  must  specify  an  initial  value  for  every  parameter 
that  can  be  integrated  in  some  device  model. 

Consider  the  following  subset  of  the  equations  of  the  device  model  in  Figure  8.4. 
This  subset  corresponds  to  the  thermal  interaction  between  battery-5  and  atin-6: 
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dTk/dt  =  Cbfab 
dTJdt  =  CJa 

fab  =  l[ab{Ta  —  Tf,) 

txogenous{Ci,) 

exogenous(Ca) 

exogenous{“iab) 

where  the  parameters  have  the  following  denotations; 

TirTemperature  of  battery-5 
Ci,:Heat  capacity  of  battery-5 
TarTemperature  of  atm- 6 
Cfl.'Heat  capacity  of  atm-6 
/ai:Heat  flow  from  atm-6  to  battery-5 
/a:Net  heat  flowing  into  atm-6 
7ai,;Thermal  conductaince  of  atm-battery 

Since  Ca,  Cf,,  and  7a6  are  assumed  to  be  exogenous  in  this  model,  the  program  looks  up 
their  exogenous  values  from  the  input.  Since  Ta  and  T\,  are  determined  by  integrating 
dTafdt  and  dTi,/dt,  respectively,  the  program  looks  up  their  initial  values  from  the 
input.  Then,  using  the  exogenous  value  of  7ai„  and  the  initial  values  of  Ta  and  Tf,, 
the  program  calculates  the  order  of  magnitude  of  fab,  using  the  techniques  developed 
in  Chapter  7.  Similarly,  the  program  calculates  the  orders  of  magnitudes  of  all  the 
parameters  in  the  device  model. 

Enforcing  behavioral  coherence  constraints 

In  the  fourth  step,  the  algorithm  uses  the  behavior  generated  above  to  enforce  all  the 
behavioral  coherence  constraints.  This  step  is  exactly  analogous  to  the  way  structural 
coherence  constraints  were  enforced  in  the  second  step.  If  there  are  any  changes 
to  the  device  model,  the  algorithm  loops  back  to  the  second  step,  to  ensure  that 
all  the  structural  and  heuristic  coherence  constraints  continue  to  be  satisfied.  The 
algorithm  loops  through  steps  two,  three,  and  four,  until  all  the  heuristic,  structural. 
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and  behavioral  coherence  constraints  are  satisfied.  Once  this  happens,  the  algorithm 
proceeds  to  check  whether  the  current  device  model  satisfies  the  expected  behavior.  If 
the  expected  behavior  is  satisfied,  we  have  an  initial  causal  model,  and  the  algorithm 
terminates.  However,  if  the  expected  behavior  is  tisfied,  the  algorithm  proceeds 
to  the  fifth  step. 

The  device  model  constructed  above  satisfies  all  the  behavioral  coherence  con¬ 
straints,  but  does  not  satisfy  the  expected  behavior.  Hence,  the  algorithm  proceeds 
to  the  fifth  step  with  that  model. 

Augmenting  the  device  model 

In  the  fifth  step,  the  algorithm  augments  the  device  model  as  follows.  Recall  that 
in  the  first  step,  to  ensure  that  the  device  model  contained  the  parameter  p,  the 
algorithm  had  to  mahe  Cp  an  instance  of  Mp.  To  do  this  it  chose  the  most  general 
specialization  of  Mp  that  was  also  a  possible-model  of  the  component  class  of  Cp. 
However,  other  more  specific  model  fragment  classes  could  also  have  been  used  to 
satisfy  this  constraint.  Similarly,  the  algorithm  chose  the  most  general  way  to  satisfy 
the  heuristic  coherence  constraints  in  the  second  step.  Once  again,  more  specific 
model  fragment  classes  could  have  been  used  to  satisfy  these  constraints. 

In  the  fifth  step,  the  algorithm  augments  the  device  model  with  one  of  the  more 
specific  ways  of  satisfying  the  constraints  in  the  first  and  second  step.  The  algorithm 
then  loops  back  to  the  second  step,  and  continues  looping  through  steps  two,  three, 
four,  and  five,  until  a  causal  model  is  found. 

If  no  causal  model  is  found,  and  there  are  no  additional  ways  of  satisfying  the 
constraints  in  the  first  and  second  step,  then  the  algorithm  terminates  with  failure, 
reporting  that  there  is  no  causal  model.  This  is  justified  because  the  component 
interaction  heuristic  guarantees  that  the  component  models  used  in  the  final  device 
model  cannot  interact  with  any  other  components,  and  hence  no  augmentation  of  the 
device  model  can  lead  to  a  causal  model. 

For  example,  the  first  step  satisfied  the  constraint  that  thermistor-l  be  modeled 
tis  an  instance  of  Temperature-model  by  modeling  thermistor-1  eis  an  instance 
of  Thermal-object.  A  more  specific  way  of  satisfying  this  constraint  is  to  model 
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thermistor-l  as  a  Thermal -thermistor.  Since  the  device  model  constructed  until 
now  is  not  a  causal  model,  the  fifth  step  augments  the  device  model  with  the  Ther- 
®®'l~'thermistor(thermistor“l)  model  fragment,  and  loops  back  to  the  second  step. 

Since  Thermal-thenuistors  are  also  Electrical-components,  a  heuristic  coher¬ 
ence  constraint  requires  that  all  components  connected-to  any  of  thermistor-l’s 
voltage  terminals  must  be  modeled  as  Electrical-components.  Hence,  wire-4  and 
battery-5  should  be  modeled  as  Electrical-components.  This  is  achieved  by  mod¬ 
eling  wire-4  as  an  Electrical-conductor,  and  battery-5  as  a  Voltage-source. 

To  complete  the  description  of  electrical  conduction  (i.e.,  to  satisfy  the  requires 
constraint  associated  with  Electrical-conductor),  wire-4  is  modeled  as  an  in¬ 
stance  of  Resistor,  and  to  complete  the  description  of  resistance  wire-4  is  further 
modeled  as  a  Temperature-dependent-resistance.  Similarly,  to  complete  the  de¬ 
scription  of  a  voltage  source,  battery-5  is  modeled  as  a  Voltzige-source-with-in- 
ternal-resistance. 

Using  this  model,  the  order  of  magnitude  behavior  is  generated,  and  the  behavioral 
coherence  constraints  are  checked.  Assuming  that  the  heat  generated  in  wire-4  is 
greater  than  the  electrical-power-threshold,  a  behavioral  coherence  constraint 
requires  that  wire-4  be  modeled  as  a  Thermal-resistor.  The  resulting  model, 
shown  in  Figure  8.5,  satisfies  all  the  heuristic,  structural,  and  behavioral  coherence 
constraints.  One  can  show  that  this  model  satisfies  the  expected  behavior,  and  hence 
it  is  the  initial  causal  model  generated  by  the  algorithm. 


8.2  Simplifying  the  model 

The  initial  causal  model  identified  above  is  then  simplified  in  two  stages  using  the 
techniques  developed  in  Section  5.7.  The  behavior  generated  using  this  initial  causal 
model  is  used  to  evaluate  all  behavioral  preconditions  and  behavioral  coherence  con¬ 
straints.  We  now  illustrate  the  simplification  procedure  on  the  model  in  Figure  8.5. 

In  the  first  stage  of  simplification,  model  fragments  are  replaced  by  their  approx¬ 
imations,  until  no  more  simplification  by  approximation  is  possible.  Dynamic-ther¬ 
mal-model  has  two  immediate  approximations;  Const  ant -temperature-model  and 
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Component 

Model 

thernistor-l 

Ther«al-ob j  ect 

Dynanic-thermal-inodel 

Thermal -thermistor 

pointer-2 

Rot  at ing-ob j  ect 

Thermal-object 

Dynamic-thermal-model 

bms-3 

Thermal -bimetallic-strip 
Dynamic-thermal-model 

wire-4 

Thermal-object 

Dynamic-thermal-model 

Electrical-conductor 

Resistor 

Temperatur  e-depend  ent-re  s  i.st  anc  e 

Thermal -re s istor 

battery-5 

Thermad-ob j  ect 

Dynamic-thermad-model 

Voltage-source 

Volt  age-s  our  ce-with-int  emad-r  e  s  i  s  t  ance 

atn-6 

Thermad-object 

Dynamic-thermad-model 

bms-wire 

Resistive-thermad-conductor 

atn-pointer 

Resist ive-thermad-conductor 

atm-bDS 

Re  s ist ive-thermad-conductor 

atm-battery 

Re  s ist ive-thermal-conduct or 

Figure  8.5:  The  initial  causal  model. 
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Component 

Simplest  causal  model 
by  approximating 

Minimed  causal 
model 

thennistor-l 

Thermal-object 

Thermal.- object 

Constant-temperature-model 

Constant-temperature-model 

Thermal-thermistor 

Thermal-thermistor 

pointer-2 

Rotating-object 

Rotating-ob j  ect 

Thermal-ob j  ect 

Equilibriujn-therB2Ll-model 

bms-3 

. 

Thermal-bimetallic-strip 

Thermeil-biBetaillic-strip 

Equilibrium-thermal-model 

Equilibr ium-the  rmal-mod  el 

wire-4 

. 

Thermal-ob j  act 

ThermcJ.-object 

Equilibrium-thermal-model 

Equilibrium-thermal -model 

Electrical-conductor 

Electrical-conductor 

Resistor 

Resistor 

Constant-resistance 

Constant-resistance 

Thermal-re  s istor 

Thermcil-resistor 

battery-S 

Thermal-object 

Equil ibr ium-thermal-model 

Voltage-source 

Voltage-source 

Constant-voltage-source 

Constant-voltage-source 

atm-6 

. 

Thermal-object 

Thermad-object 

Constant-temperature-model 

I 

Constant-temperature-model 

bms-wire 

Resistive-thermal-conductor 

Resistive-thermal-conductor 

atm-pointer 

Ide2d -thermal-conductor 

atm-bms 

Resistive-thermal-conductor 

Resistive-thermaQ-conductor 

atm-battery 

. 

Ideal -theimail-conductor 

Figure  8.6:  The  two  stages  of  simplifying  the  initial  causal  model.  The  boxed  model 
fragments  in  the  second  column  are  the  approximate  model  fragments  that  have 
replaced  more  accurate  model  fragments  in  the  initial  causal  model. 


Equilibrium-thennal-model.  These  correspond  to  exogenizing  and  equilibrating 
the  differential  equation  in  Dynamic- thermal -model.  Constant-resistance  is  the 
only  approximation  of  Temperature-dependent  -resistance,  Constant-voltage- 
source  is  the  only  approximation  of  Voltage-source-with-internal-resistance, 
and  Ideal-thermal-conductor  and  Ideal-thermal-insulator  are  the  two  approx¬ 
imations  of  Resistive-thermal-conductor. 

The  second  column  in  Figure  8.6  shows  one  possible  result  of  approximating  the 
initial  causal  model’s  model  fragments  as  much  as  possible,  while  'etaining  the  causal 
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model  property  (Section  5.7.1  called  this  the  “simplest  causal  model  by  approximat¬ 
ing”).  In  fact,  in  this  case,  the  behavioral  preconditions  dictate  that  this  is  the  only 
such  causal  model,  ^or  exaimple,  Resistive-thernal-conductorCatm-battery) 
cannot  be  approximated  by  Ideal -thermal-insulator (atm-battery)  because  the 
thermal  resistance  of  atm-battery  is  not  large  enough.  In  addition,  model  frag¬ 
ments  like  Resistive-thermal-conductor(bms-wire)  cannot  be  approximated  be¬ 
cause  then  the  model  ceases  to  be  a  causal  model. 

The  above  simplest  causal  model  by  approximating  is  then  simplified  further  by 
retaining  only  the  relevant  model  fragments  (as  described  in  Section  5.7.2).  The 
resulting  minimal  causa’  model  is  shown  in  the  third  column  of  Figure  8.6.  Irrelevant 
model  fragments  describing  the  thermal  properties  of  pointer-2  and  battery-5  have 
been  dropped,  as  have  the  model  fragments  describing  the  heat  conduction  properties 
of  atm-pointer  and  atm-battery. 


8.3  Implementation  and  results 

We  have  implemented  the  above  model  selection  aJgorithm  in  Common  Lisp,  and 
tested  it  on  a  variety  of  electromechanical  devices.  We  now  give  an  overview  of  this 
implementation. 


8.3.1  Overview  of  the  knowledge  bzise 

We  have  constructed  a  library  of  20  different  types  of  components  including  wires, 
bimetallic  strips,  springs,  and  permanent  magnets.  The  library  of  model  fragment 
classes  consists  of  approximately  150  different  types  of  model  fragment  classes  in¬ 
cluding  descriptions  of  electricity,  magnetism,  heat,  elasticity,  and  the  kinematics 
and  dynamics  of  cne-dimensional  motion  (including  both  rotation  and  translation). 
Each  component  class  has  an  average  of  30  model  fragment  clcisses  describing  different 
aspects  of  its  behavior. 
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8.3.2  Overview  of  the  devices 

The  model  selection  program  has  been  tested  on  ten  electromechanical  devices  drawn 
from  [Artobolevsky,  1980;  Macaulay,  1988;  van  Amerongen,  1967].  Table  8.1  shows 
the  names  of  these  ten  devices,  and  the  number  of  components  in  each  of  them.  The 
number  of  components  in  each  device  is  the  sum  of  the  number  of  components  in  the 
original  device  description  and  the  number  of  structural  abstractions  identified  by  the 
system  (see  Section  2.4.2).  One  can  see  that  the  devices  range  in  complexity  from 
only  10  components  in  the  bimetallic  strip  thermostat,  to  54  components  in  the  car 
distributor  system. 

Some  of  these  devices  can  operate  in  more  than  one  operating  region.  Each  oper¬ 
ating  region  corresponds  to  a  different  set  of  inputs  to  the  model  selection  program. 
Hence,  different  operating  regions  can  have  different  device  structures,  expected  be¬ 
haviors,  initial  and  exogenous  values,  and  thresholds.  Table  8.1  shows  the  number  of 
operating  regions  of  each  device  that  our  model  selection  program  was  run  on. 


Device  name 

Number  of 
components 

Number  of 
operating  regions 

Bimetallic  strip  temperature  gauge 

12 

1 

Bimetallic  strip  thermostat 

10 

2 

Flexible  wire  temperature  gauge 

13 

1 

Galvanometer  temperature  gauge 

19 

1 

Electric  bell 

22 

2 

Magnetic  sizing  device 

22 

1 

Carbon  pile  regulator 

26 

1 

Electromagnetic  relay  thermostat 

30 

3 

Tachometer 

34 

1 

Car  distributor  system 

54 

1 

Tabic  8.1:  Number  of  components  and  operating  regions  used  in  each  device. 

We  now  give  a  brief  description  of  each  of  these  devices,  highlighting  their  most 
important  aspects  from  the  modeling  perspective.  Detailed  descriptions  of  these 
devices  can  be  found  in  Appendix  B.  These  devices  have  been  selected  primarily  to 
demonstrate  the  fact  that  similar  components  in  different  devices  can  be  modeled 
differently.  In  addition,  device  descriptions  also  include  irrelevant  information.  In  all 
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cases,  our  model  selection  program  correctly  identifies  the  right  component  models, 

and  disregards  irrelevant  information. 

Bimetallic  strip  temperature  gauge:  This  is  just  the  temperature  gauge  in  Fig¬ 
ure  1.1.  To  understand  how  it  works,  one  must  model  the  heat  generated  in 
the  wire  due  to  current  flow,  and  the  deflection  of  the  bimetallic  strip  due  to 
temperature  changes. 

Bimetallic  strip  thermostat:  This  device  contains  a  bimetallic  strip  which  regu¬ 
lates  the  temperature  of  a  room  by  turning  on  a  heater  when  the  room  becomes 
too  cold.  To  understand  how  it  works,  one  must  model  both  the  deflection  of 
the  bimetallic  strip  due  to  temperature  changes  and  the  electrical  conductivity 
of  the  bimetallic  strip. 

Flexible  wire  temperature  gauge:  This  temperature  gauge  is  very  similar  to  the 
one  in  Figure  1.1,  except  that  the  pointer’s  angular  position  is  determined  by 
the  length  of  a  wire,  rather  than  the  deflection  of  a  bimetallic  strip.  Since  the 
wire’s  length  depends  on  its  temperature,  it  follows  that  to  understand  how  this 
temperature  gauge  works,  we  must  model  not  only  the  heat  generated  in  the 
wire  due  to  current  flow,  but  also  the  dependence  of  the  wire’s  length  on  its 
temperature. 

Galvanometer  temperature  gauge:  This  temperature  gauge  is  also  similar  to  the 
one  if  Figure  1.1,  except  that  the  current  in  the  circuit  is  measured  using  a 
galvanometer,  rather  than  measuring  the  deflection  of  the  bimetallic  strip.  A 
galvanometer  works  by  mecisuring  the  magnetic  field  generated  by  the  current 
flowing  in  a  coil  of  wire.  Hence,  to  understand  how  this  temperature  gauge 
works,  we  must  model  the  magnetic  field  generated  by  the  coil  of  wire,  but  need 
not  model  the  heat  generated  in  the  wire. 

Electric  bell:  This  device  consists  of  a  hammer  and  a  bell.  The  device  goes  through 
two  major  operating  regions.  In  the  first  operating  region,  an  electric  circuit 
is  completed  through  the  hammer,  activating  an  electromagnet  which  attracts 
the  hammer,  causing  the  hammer  to  strike  the  bell.  Hence,  to  understand  the 
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electric  bell’s  operation  in  this  operating  region,  we  must  model  the  electric  and 
magnetic  properties  of  the  bell.  When  the  hammer  strikes  the  bell,  it  breaks  the 
electric  circuit,  thereby  deactivating  the  electromagnet,  and  hence  allowing  the 
hammer  to  return  to  its  original  position.  To  understemd  why  this  happens,  we 
must  model  the  hammer  as  a  spring  that  restores  the  hammer’s  position  when 
the  external  electromagnetic  force  is  removed. 

Magnetic  sizing  device:  This  device  is  used  to  measure  the  size  of  workpieces  in  a 
factory.  It  works  on  the  principle  that  the  magnetic  flux  in  a  magnetic  circuit  is 
dependent  on  the  length  of  the  air  gaps  in  the  circuit.  The  device  is  constructed 
to  make  the  length  of  the  air  gaps  proportional  to  the  size  of  the  workpiece.  The 
important  point  to  note  here  is  that,  unlike  in  a  galvanometer,  to  understand 
how  this  device  works,  we  must  model  magnetism  using  a  magnetic  circuit 
ontology  (see  the  discussion  in  Section  5.2).  In  addition,  the  aii  gaps  must  be 
modeled  as  magnetic  flux  conductors. 

Carbon  pile  regulator:  This  device  allows  manual  regulation  of  the  voltage  sup¬ 
plied  to  another  device.  The  principle  of  its  operation  is  that  the  resistamce  of 
a  carbon  pile  is  proportional  to  the  compressive  force  acting  on  it.  Hence,  to 
understand  its  operation,  we  must  model  the  dependence  of  the  carbon  pile’s 
resistance  on  the  compressive  force. 

Electromagnetic  relay  thermostat  '^his  thermostat  is  similar  to  the  bimetallic 
strip  thermostat.  The  primary  difference  is  that  in  the  bimetallic  strip  thermo¬ 
stat  the  bimetallic  strip  directly  turned  on  the  heater,  while  in  this  device  the 
bimetallic  strip  turns  on  an  electromagnetic  relay  which  turns  on  the  heater. 
Understanding  the  operation  of  the  electromagnetic  relay  requires  the  magnetic 
circuit  ontology  mentioned  above. 

Tachometer:  This  device,  which  measures  angular  speed,  is  very  interesting  because 
it  consists  of  two  similar  structures,  each  of  which  consists  of  a  coil  of  wire  wound 
around  an  iron  core  and  embedded  in  an  external  magnetic  field.  The  interesting 
part  is  that,  though  these  structures  are  very  similar,  they  are  modeled  very 
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differently.  The  first  behaves  as  an  electric  generator:  the  rotation  of  the  coil 
in  the  magnetic  field  causes  an  induced  voltage.  The  second  behaves  as  a 
galvanometer:  the  current  flowing  in  the  coil  causes  the  coil  to  deflect  in  the 
externaJ  field. 

Car  distributor  system:  This  is  just  a  description  of  the  distributor  system  in  a 
cal',  including  the  spark  plugs.  The  interesting  part  of  this  device  is  the  air  gaps 
in  the  spark  plugs  which  are  modeled  as  electric  conductors  because  of  the  high 
voltage  drops  across  them. 

8.3.3  Results 

Table  8.2  shows  a  summary  of  our  experimental  results  on  the  devices  described 
above.  As  mentioned  above,  the  model  selection  program  was  run  on  more  than  one 
operating  region  for  some  of  the  devices.  In  such  cases,  the  numbers  in  this  table 
correspond  to  the  totals  over  all  the  runs  for  that  device. 


Device  name 

Estimated 

space 

Generated 

space 

Time  (sec) 
on  Explorer  11 

Bimetallic  strip  temperature  gauge 

3.8el6 

46 

no  „ 

Bimetallic  strip  thermostat 

5.4eI2  ! 

. '""iCi - ' 

Flexible  wire  temperature  gauge 

2.6e20 

78 

59.9 

Galvanometer  temperature  gau/; 

‘.le31 

120 

149.8 

Electric  bell 

6.6e40 

117 

262.4 

Magnetic  sizing  device 

2.1e51 

117 

456.0 

Carbon  pile  regulator 

1.5e49 

115 

262.5 

Electromagnetic  relay  thermostat 

8.7e49 

293 

472.7 

Tachometer 

6.8e58 

195 

503.6 

Car  distributor  system 

9.9e72 

160 

352.6 

Table  8.2:  Summary  of  experimental  results 


The  second  column  displays  the  total  number  of  models  that  (a)  have  at  most 
one  model  fragment  from  each  assumption  class;  and  (b)  have  a  model  fragment  from 
each  required  eissumption  class.  This  number  is  easy  to  calculate  from  the  knowledge 
base,  and  it  provides  a  rough  estimate  of  the  total  number  of  consistent  and  complete 
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models  of  each  device.^  As  one  can  see,  these  numbers  are  very  large,  ranging  from 
about  10^^  to  over  This  means  that  any  sort  of  brute-force  search  for  adequate 
models,  that  searches  any  significant  fraction  of  this  space,  is  completely  hopeless. 

The  third  column  shows  the  total  number  of  models  actually  examined  by  the  pro¬ 
gram.  This  is  the  sum  of  the  number  of  models  examined  during  the  first  phase,  when 
the  program  constructs  an  initial  causal  model,  and  the  number  of  models  examined 
during  the  second  phase,  when  the  progrcim  simplifies  the  model.  These  numbers 
range  from  a  minimum  of  46  to  a  maximum  of  293.  As  one  can  see,  these  numbers 
are  orders  of  magnitude  smaller  than  the  numbers  in  the  second  column,  making 
model  selection  practical.  This  clearly  demonstrates  the  utility  of  the  restrictions 
introduced  in  Chapters  5  and  6,  including  the  use  of  causal  approximations. 

The  fourth  column  shows  the  actual  run  time  on  an  Explorer  II.  These  times 
range  from  a  little  less  than  half  a  minute  for  the  bimetallic  strip  temperature  gauge 
to  a  little  over  eight  minutes  for  the  tachometer.  Significantly  faster  runs  have  been 
observed  on  different  machines  using  different  Lisp  implementations.  For  example, 
the  tachometer  example  heis  been  run  in  a  little  over  a  minute  and  a  half  on  a  Sparc 
Station  2  under  Lucid  Lisp  version  4.1  [Jon  L  White,  personal  communication]. 

Table  8.3  shows  the  number  of  model  fragments  in  the  most  accurate  model, 
thi;  initi.j*  causal  model,  and  the  minimal  causal  model  of  each  device.  The  initial 
causal  model  and  the  minimal  causal  model  are,  of  course,  the  ones  constructed  by 
the  program  using  the  methods  described  above.  Multiple  entries  for  a  single  device 
correspond  to  running  the  program  on  the  different  operating  regions  of  that  device. 

One  can  see  from  this  table  that  the  number  of  model  fragments  in  the  initial 
causal  model  is  significantly  less  than  the  number  in  the  most  accurate  model.  In 
fact,  on  the  average,  the  ratio  of  the  number  of  model  fragments  in  the  initial  causal 
model  to  the  number  in  the  most  accurate  model  is  0.52.  This  shows  that  the  heuristic 
method  is  effective  in  finding  an  initial  causal  model  that  is  significantly  simpler  than 
the  most  accurate  model. 

The  table  also  shows  that,  in  most  cases,  the  minimal  causal  model  is  significantly 

^Of  course,  calculating  the  exact  number  of  (a)  consistent  and  complete  models;  (b)  coherent 
models;  or  (c)  causal  models,  of  each  device  can  only  be  done  by  explicitly  checking  all  the  models — 
a  completely  impractical  task. 
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Number  of  model  fra 

gments 

Device  name 

Most  accurate 

Initial  causal 

Minimal  causal 

model 

model 

model 

Bimetallic  strip  temperature  gauge 

75 

36 

27 

Bimetallic  strip  thermostat 

54 

38 

14 

54 

39 

31 

Flexible  link  temperature  gauge 

94 

60 

25 

Galvanometer  temperature  gauge 

154 

98 

28 

Electric  bell 

177 

7 

6 

177 

108 

45 

Magnetic  sizing  device 

202 

122 

43 

Carbon  pile  regulator 

211 

122 

51 

Electromagnetic  relay  thermostat 

31 

36 

211 

74 

14 

Tachometer 

285 

170 

44 

Car  distributor  system 

348 

178 

28 

Table  8.3:  Number  of  model  fragments  in  the  most  accurate  model,  the  initial  causal 
model,  and  the  minimal  causal  model  constructed  by  the  program. 


simpler  than  the  initial  causal  model.  In  fact,  on  the  average,  the  ratio  of  the  number 
of  model  fragments  in  the  minimal  causal  model  to  the  number  in  the  initial  causal 
model  is  0.33.  This  shows  that  the  heuristic  method  of  finding  an  initial  causal  model 
is  not,  by  itself,  sufficient  to  find  a  minimal  causal  model,  or  even  a  model  that  is 
close  to  being  a  minimal  causal  model:  the  techniques  developed  in  Chapter  5  are 
still  necessary. 

Finally,  Table  8.4  shows  the  number  of  model  fragments  that  were  dropped  and 
approximated  in  simplifying  the  initial  causal  model  to  get  the  minimal  causal  model. 
Note  that  the  num.ber  in  the  third  column  corresponds  only  to  the  model  fragments 
that  wer®  approximated  in  the  first  phase  of  simplification,  but  were  not  dropped  in 
the  second  phase  of  simplification.  The  number  in  the  second  column  corresponds 
to  the  total  number  of  model  fragments  that  were  dropped  in  the  second  phase  of 
simplification. 
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Device  name 

Simplifications  to  the 
initial  causal  model 

Dropped 

Approximated 

Bimetallic  strip  temperature  gauge 

9 

7 

Bimetallic  strip  thermostat 

24 

3 

8 

9 

Flexible  link  temperature  gauge 

35 

5 

Galvanometer  temperature  gauge 

70 

6 

Electric  bell 

1 

0 

63 

10 

Magnetic  sizing  device 

79 

6 

Carbon  pile  regulator 

71 

12 

Electromagnetic  relay  thermostat 

86 

10 

83 

5 

60 

4 

Tachometer 

126 

9 

Car  distributor  system 

150 

9 

Table  8.4:  Number  of  model  fragments  that  were  dropped  and  approximated  in  sim¬ 
plifying  the  initial  causal  model. 


8.4  Summary 


In  this  chapter  we  described  our  implemented  model  selection  program,  and  presented 
some  experimented  results.  The  model  selection  program  takes  four  inputs:  (a)  the 
structure  of  the  device;  (b)  the  expected  behavior;  (c)  initied  values  and  exogenous 
values;  and  (d)  threshold  values.  Using  this  input,  the  program  first  finds  an  initial 
causal  model  using  a  heuristic  technique  based  on  the  component  interaction  heuristic. 

The  component  interaction  heuristic  states  that  if  a  set  of  components  are  related 
by  one  or  more  structural  relations  that  support  an  interaction,  and  if  one  of  the 
component  models  is  compatible  with  this  interaction,  then  the  remaining  component 
models  must  be  augmented  to  be  compatible  with  this  interaction.  This  allows  the 
components  in  the  set  to  interact  with  each  other  via  that  interaction.  The  component 
interaction  heuristic  is  implemented  as  a  set  of  heuristic  coherence  constraints.  The 
initial  causal  model  is  required  to  satisfy  all  such  heuristic  coherence  constraints. 
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The  algorithm  for  finding  the  initial  causal  model  essentially  enforces  the  follow¬ 
ing  constraints:  (a)  the  device  model  must  include  all  parameters  in  the  expected 
behavior;  (b)  structural  coherence  constraints;  (c)  heuristic  coherence  constraints; 
and  (d)  behavioral  coherence  constraints.  If  the  device  model  resulting  from  enforc¬ 
ing  these  constraints  is  not  a  causal  model,  the  algorithm  augments  the  device  model 
in  a  focussed  manner,  until  a  causal  model  is  found. 

Finally,  the  initial  causal  model  is  simplified  using  the  techniques  developed  in 
Section  5.7. 

The  model  selection  program  has  been  tested  on  ten  different  electromechanical 
devices.  These  devices  have  been  selected  primarily  to  demonstrate  that  similar  com¬ 
ponents  in  different  devices  are  modeled  differently.  The  results  of  our  experiments 
show  that 

1.  brute-force  search  is  indeed  hopeless — the  space  of  possible  device  models  is  just 
too  large; 

2.  the  restrictions  introduced  in  Chapters  5  and  6  allow  the  model  selection  pro¬ 
gram  to  explore  a  tiny  fraction  of  the  enormous  search  space,  making  model 
selection  practical; 

3.  the  heuristic  technique  for  finding  an  initial  causal  model  does  result  in  causal 
models  that  are  significantly  simpler  than  the  most  accurate  model;  and 

4.  the  heuristic  technique  is  insufficient  for  finding  adequate  models — the  initial 
causal  model  still  needs  to  be  significantly  simplified  to  find  a  minimal  causal 
model. 
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Chapter  9 
Related  work 


In  this  chapter  we  compare  our  work  to  other  work  in  automated  modeling.  Chapter  7 
includes  a  comparison  of  our  work  on  order  of  magnitude  reasoning  to  other  work  in 
that  area,  and  hence  we  will  not  repeat  it  here. 

One  of  the  earliest  discussions  of  the  importance  of  selecting  adequate  models 
for  efficient  problem  solving  is  found  in  [Amarel,  1968].  Much  work  in  planning 
with  abstractions,  startiiAg  with  [Sacerdoti,  1974],  has  shown  how  the  use  of  multiple 
abstractions  can  speed  up  planning.  Patil  et  al.  show  how  medical  diagnosis  can  be 
done  with  multiple  models  of  a  patient  [Patil  et  c/.,  1981],  and  Sussman  and  Steele 
show  how  multiple  views  of  an  electronic  circuit,  called  slices,  can  lead  to  tractable 
reasoning  [Sussman  and  Steele  Jr.,  1980].  A  growing  bod: '  of  literature  is  also  focussed 
on  creating  new  representations  and  abstractions  from  existing  representations,  e.g., 
see  [Korf,  1980;  Subramanian  and  Genesereth,  1987;  Van  Baalen  and  Davis,  1988; 
Unruh  and  Rosenbloom,  1989;  Christensen,  1990;  Knoblock,  1991]  and  the  articles  in 
[Ellman,  1990;  Ellman,  1992]. 

Instead  of  reviewing  this  large  body  of  work,  we  will  instead  focus  on  the  work 
related  to  automatically  selecting  an  adequate  model  from  a  space  of  possible  models. 
This  is  in  contrast  to  (a)  manual  selection  of  adequate  models,  e.g.,  [Sussman  and 
Steele  Jr.,  1980];  (b)  the  work  on  creating  new  representations  and  abstractions;  and 
(c)  cases  where  all,  or  most,  of  the  models  are  used  synergisticadly  in  problem  solving, 
e.g.,  much  of  the  work  on  abstraction  planning.  Much  of  this  review  will  focus  on  the 
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work  in  the  domain  of  physical  systems.  This  is  the  topic  of  Section  9.1.  Section  9.2 
reviews  some  recent  work  in  logical  approaches  to  the  same  problem. 


9.1  Automated  modeling  of  physical  systems 

In  this  section  we  will  review  the  recent  work  on  automated  modeling  of  physical 
systems. 

9.1.1  Compositional  modeling 

The  work  most  similar  to  ours  is  the  work  on  compositional  modeling  [Falkenhainer 
and  Forbus,  1991;  Falkenhainer  and  Forbus,  1988].  In  this  work,  as  in  ours,  device 
models  are  constructed  by  composing  a  set  of  model  fragments.  Each  model  fragment 
is  conditioned  on  a  set  of  modeling  assumptions  which  explicate  the  approximations, 
perspectives,  granularity,  and  operating  assumptions  underlying  the  model  fragment. 
Mutually  contradictory  assumptions  are  organized  into  assumption  classes,  and  a  set 
of  domain-independent  and  domain-dependent  constraints  are  used  to  govern  the  use 
of  modeling  assumptions.  A  user  query  focuses  the  selection  of  adequate  device  models 
by  requiring  that  every  adequate  model  must  contain  the  terms  mentioned  in  the 
query.  Hence,  an  adequate  device  model  is  a  simplest  model  that  contains  all  the  terms 
mentioned  in  the  query,  and  uses  only  model  fragments  that  are  entailed  by  a  set  of 
mutually  consistent  assumptions  satisfying  all  the  domain-independent  and  domain- 
dependent  constraints.  An  adequate  model  is  constructed  using  a  variant  of  constraint 
satisfaction  called  dynamic  constraint  satisfaction  [Mittal  and  Falkenhainer,  1990], 
and  then  validated  using  either  qualitative  or  numerical  simulation.  If  the  validation 
discovers  any  inconsistencies,  the  process  is  repeated  with  this  additional  information. 

There  are  many  similarities  between  our  work  and  theirs.  First,  our  use  of  struc¬ 
tural  and  behavioral  preconditions  and  coherence  constraints  is  very  similar  to  their 
use  of  assumptions  and  constraints  on  the  use  of  assumptions.  Second,  the  first  step 
of  our  heuristic  algorithm  for  finding  an  initial  causal  model  (see  Figure  8.1)  is  similar 
to  their  requirement  that  an  adequate  model  must  contain  the  terms  in  the  query. 
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Third,  our  loop  between  the  second,  third,  and  fourth  steps  in  the  algorithm  for  find¬ 
ing  an  initial  causal  model  (see  Figure  8.1)  is  similar  to  their  loop  between  finding  a 
model  and  validating  it  with  behavior  generation. 

However,  there  are  a  number  of  differences.  The  most  important  difference  is  in  the 
definition  of  model  adequacy:  they  have  no  counterpart  of  the  expected  behavior.  Our 
focus  on  the  task  of  causal  explanation  Las  allowed  us  to  use  the  expected  behavior  as  a 
central  constraint  on  model  adequacy,  thereby  decreasing  the  importance  of  structural 
and  behavioral  coherence  constraints.  Furthermore,  because  of  the  importance  of 
causal  explanations  to  other  tasks  (see  Section  3.2.1),  the  expected  behavior  can  also 
provide  important  constraints  on  model  adequacy  for  other  tasks.  On  the  other  hand, 
in  compositional  modeling,  the  constraints  on  the  use  of  assumptions  play  a  central 
role  in  defining  model  adequacy,  and  any  task  focus  has  to  be  embedded  in  these 
constraints.  Embedding  such  a  task  focus  is,  in  general,  not  easy.  For  example,  it  is 
not  clear  how  the  expected  behavior  of  a  device  can  be  expressed  as  a  set  of  declarative 
constraints. 

A  second  difference  is  that  we  exploit  the  restrictions  introduced  in  Chapter  5,  es¬ 
pecially  the  use  of  causal  approximations,  to  develop  a  polynomieil  time  algorithm  for 
finding  adequate  models.  Note  that  the  abovementioned  decrease  in  the  importance 
of  coherence  constraints  means  that  the  restriction  on  their  expressive  power  (see  Sec¬ 
tion  5.6)  has  not  proved  to  be  serious.  On  the  other  hand,  the  constraints  on  the  use 
of  assumptions  play  a  central  role  in  compositioneil  modeling,  so  no  such  restriction 
in  expressive  power  is  possible.  Hence,  their  model  selection  algorithm  is  based  on 
dynamic  constraint  satisfaction,  which  can,  in  the  worst  case,  take  exponential  time. 

A  third  difference  is  that,  while  their  system  uses  either  qualitative  simulation  or 
numerical  simulation  for  behavior  generation,  ours  uses  order  of  magnitude  reason¬ 
ing.  At  the  beginning  of  Chapter  7  we  discussed  the  advantages  of  order  of  magnitude 
reasoning  over  purely  qualitative  or  purely  numerical  methods.  The  primary  disad¬ 
vantage  of  using  our  order  of  magnitude  reasoning  technique,  compared  to  qualitative 
or  numerical  simulation,  is  that  it  is  currently  restricted  to  generating  the  behavior 
at  a  fixed  point  in  time. 

A  fourth  difference  lies  in  the  handling  of  multiple  operating  regions  of  a  device. 
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While  they  generate  a  single  model  for  all  operating  regions  of  a  device,  we  ran 
generate  different  models  for  each  operating  region.  However,  both  techniques  have 
their  drawbacks.  The  drawback  with  their  technique  is  that  the  single  model  may 
not  be  the  most  appropriate  model  in  all  operating  regions.  The  drawback  with  our 
technique  is  that  we  need  a  new  set  of  inputs  for  each  new  operating  region. 


9.1.2  Graphs  of  models 

The  work  on  graphs  of  models  [Addanki  et  al.,  1991]  discusses  a  technique  for  selecting 
models  of  acceptable  accuracy.  A  graph  of  models  is  a  graph  in  which  the  nodes  are 
models  and  the  edges  are  assumptions  that  have  to  be  changed  in  moving  from  one 
model  to  another.  A  model  in  this  graph  has  acceptable  accuracy  if  its  predictions 
are  free  of  conflicts.  Conflicts  are  detected  either  empirically  or  interally.  Empirical 
conflicts  are  detected  by  experimentally  verifying  a  model’s  predictions,  while  internal 
conflicts  are  detected  by  checking  the  model’s  predictions  against  a  set  of  consistency 
rules  that  capture  the  model’s  assumptions.  When  a  conflict  is  detected,  a  set  of 
domain- dependent  parameter  change  rules  help  to  select  a  more  accurate  model,  and 
the  above  process  is  repeated.  Analysis  begins  with  the  simplest  model  in  the  graph 
of  models,  and  terminates  when  an  accurate  enough  model  hais  been  found. 

An  important  difference  between  their  work  and  ours  is  the  representation  of 
the  space  of  models.  They  use  an  explicit  representation  of  this  space  as  a  graph 
of  models,  while  we  have  an  implicit  representation  as  a  set  of  model  fragments 
that  can  be  combined  in  different  ways  to  produce  an  exponentially  large  number  of 
device  models.  Our  approach  leads  to  greater  flexibility  in  tailoring  models  to  specific 
situations.  To  get  comparable  flexibility  in  the  graphs  of  models  approach  requires 
an  explicit  representation  of  an  exponentially  large  space  of  device  models,  which  is 
quite  impractical.  An  advantage  of  their  approach  is  that  each  model  in  the  graph 
can  have  a  specialized  problem  solver,  while  we  must  have  a  general  purpose  problem 
solver  that  is  applicable  to  all  models. 

The  consistency  rules  used  to  verify  a  model’s  predictions  are  similar  to  our  behav¬ 
ioral  preconditions  and  coherence  constraints.  However,  we  do  not  validate  a  model’s 
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predictions  empirically,  and  we  have  not  explicitly  addressed  the  problem  of  switch¬ 
ing  to  a  more  accurate  model  in  light  of  a  conflict.  Our  techniques  are  best  viewed 
as  providing  an  intelligent  method  for  selecting  an  initial  model.  Since  they  always 
start  the  analysis  with  the  simplest  model,  making  no  effort  to  identify  a  better  start¬ 
ing  model,  our  techniques  are  complementary  to  theirs:  select  an  initial  model  using 
our  technique,  and  do  model  switching  using  theirs  (but  see  the  next  section  for  an 
alternative  model  switching  technique). 


9.1.3  ^Fitting  approximations 

In  [Weld,  1990],  Weld  introduces  an  interesting  class  of  approximations  called  fitting 
approximations.  Informally,  a  model  M2  is  a  fitting  approximation  of  a  model  My 
if  Ml  contains  an  exogenous  parameter,  called  a  fitting  parameter,  such  that  the 
predictions  using  Mi  approach  the  predictions  using  M2,  as  the  fitting  parameter 
approaches  a  limit.  Weld  shows  that  when  all  the  approximations  are  fitting  ap¬ 
proximations,  the  domain-dependent  parameter  change  rules  discussed  above  can  be 
replaced  by  a  domain-independent  technique  for  model  switching. 

Fitting  approximations  and  causal  approximations  are  fundamentally  incompara¬ 
ble  because  the  former  talks  about  behavior  differences,  while  the  latter  tcJks  about 
causal  dependencies.  However,  in  practice,  it  appears  that  fitting  approximations 
axe  also  causal  approximations.  For  example,  all  the  fitting  approximations  given  in 
[Weld,  1991]  are  also  causal  approximations.  This  means  that  his  domain-independent 
technique  for  model  switching  can  be  easily  incorporated  into  our  system. 


9.1.4  Critical  abstractions 

In  [Williams,  1991a],  Williams  introduces  the  notion  of  a  critical  abstraction,  which 
is  a  parsimonious  description  of  a  device  relative  to  a  set  of  questions.  Given  a  device 
model,  he  constructs  a  critical  abstraction  in  three  steps:  (a)  eliminating  superfluous 
interactions;  (b)  aggregating  interactions  that  are  local  to  a  single  mechanism  using 
symbolic  algebra;  and  (c)  further  abstracting  the  aggregated  interactions. 
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His  motivations  for  creating  critical  abstractions  are  very  similar  to  our  motiva¬ 
tions  for  finding  minimal  causal  models — we  are  both  striving  to  find  parsimonious 
descriptions  of  how  a  device  works.  Furthermore,  his  abstraction  process  is  similar  to 
our  model  simplification  procedure.  In  fact,  the  first  step  of  his  abstraction  process, 
which  eliminates  superfluous  interactions,  is  similar  to  the  last  step  of  our  simplifica¬ 
tion  procedure,  which  drops  all  irrelevant  model  fragments.  The  primary  difference 
between  our  approaches  is  one  of  emphasis:  we  have  focussed  on  the  problem  of  se¬ 
lecting  approximations  from  a  prespecified  space  of  possible  approximations,  while  he 
has  focussed  on  finding  techniques  for  automatically  abstracting  a  base  model. 


9.1.5  Model-based  diagnosis  with  multiple  models 

One  of  the  original  inspirations  for  the  work  described  in  this  thesis  was  Davis’s  work 
on  model-based  diagnosis  [Davis,  1984].  In  that  work,  Davis  presents  a  diagnostic 
method  based  on  tracing  paths  of  causal  interactions.  He  axgues  that  the  power  of 
the  approach  stems  not  from  the  specific  diagnostic  method,  but  from  the  model  which 
specifies  the  allowed  paths  of  causal  interaction.  He  shows  that  efficient  diagnosis, 
while  retaining  completeness,  can  be  obtained  by  initially  considering  models  with 
only  a  few  paths  of  interactions,  and  adding  in  additional  paths  when  the  model  fails 
to  account  for  the  symptoms.  He  also  introduces  the  notion  of  adjacency:  components 
are  adjacent  to  each  other  if  they  can  interact  with  each  other  by  some  means. 

While  we  have  not  focussed  on  the  task  of  diagnosis,  one  can  see  that  our  simplicity 
ordering  on  models  lends  itself  to  the  above  diagnosis  technique:  diagnosis  starts  with 
the  minimal  causal  model,  with  successively  more  complex  models  being  used  if  a 
model  is  unable  to  account  for  the  symptoms.  The  restrictions  in  Chapter  5  ensure 
that  using  more  complex  models  will  add  new  paths  of  causal  interaction.  In  addition, 
the  component  interaction  heuristic  introduced  in  Chapter  8  is  closely  related  to  the 
notion  of  adjacency:  adjacent  components  must  have  compatible  models. 
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9.1.6  Reasoning  about  model  accuracy 

Accuracy  is  a  very  important  characteristic  of  an  adequate  device  model:  a  model 
must  be  sufficiently  accurate  to  be  useful.  In  this  thesis  we  have  not  developed 
any  sophisticated  techniques  for  reasoning  about  model  accuracy.  In  particular,  a 
model  is  deemed  to  be  accurate  enough  if  it  satisfies  all  the  behavioral  preconditions 
and  coherence  constraints,  with  different  levels  of  accuracy  corresponding  to  different 
settings  of  the  thresholds.  However,  our  system  does  not  reason  about  the  settings 
of  the  thresholds;  the  threshold  values  are  part  of  the  input.  In  this  section  we  give 
a  brief  overview  of  some  ongoing  work  by  various  authors  on  the  topic  of  reasoning 
about  model  accuracy. 

In  [Nayak,  1991],  we  present  a  domain-independent  method  for  validating  ap¬ 
proximate  equilibrium  models  against  more  accurate  models.  The  method  maikes 
predictions  based  on  the  approximate  model,  and  estimates  the  error  in  these  pre¬ 
dictions,  with  respect  to  the  more  accurate  model.  We  also  derive  conditions  under 
which  the  estimated  error  is  guaranteed  to  be  an  upper  bound  on  the  actual  error. 

Shirley  and  Falkenhainer  develop  a  framework  for  reasoning  about  model  accuracy, 
and  show  how  approximate  models  involving  differential  equations  can  be  validated 
against  a  base  model,  using  known  accuracy  requirements  on  certain  parameters 
[Shirley  and  Falkenhainer,  1990].  Error  estimation  is  done  by  computing  a  linear 
approximation  of  an  error  function,  and  numerically  iptegrating  it  over  the  interval 
of  interest.  The  validity  of  this  method  is  based  on  the  assumption  that  the  linear 
terms  in  the  error  function  dominate  the  higher  order  terms. 

Falkenhainer  extends  the  above  techniques  in  two  ways.  In  [Falkenhainer,  1992b], 
he  shows  how  accuracy  measures  obtained  from  earlier  problem  solving  episodes  can 
be  used  to  predict  accuracy  bounds  for  models  in  new  settings.  In  [Falkenhainer, 
1992a],  he  shows  how  idealizations  can  be  constructed  using  two  common  idealization 
assumptions.  He  also  develops  an  error  estimation  technique,  based  on  sampling  the 
error’s  behavior  and  fitting  a  polynomial  to  it,  and  uses  it  to  define  the  applicability 
region  of  a  model. 

An  alternate  method  for  constructing  idealizations  and  their  applicability  regions 
is  presented  in  [Raiman  and  Williams,  1992].  This  method  proceeds  by  first  finding 
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ordinal  relations  between  terms  using  MINIMA  [Williams,  1991b],  and  then  exagger¬ 
ating  these  ordinal  relations,  e.g.,  replacing  “>”  by  Using  the  resulting  order 

of  magnitude  relations,  the  equations  are  simplified,  and  the  applicability  region  of 
each  simplified  equation  is  determined,  by  ESTIMATES  [Raiman,  1991]. 

Finally,  Weld  and  Addanki  introduce  the  help/kinder  heuristic — a  query-directed 
technique  for  the  automatic  generation  of  an  approximate  device  model  [Weld  and 
Addanki,  1991].  The  help/hinder  heuristic  generates  an  approximate  model  by  sys¬ 
tematically  introducing  overestimates  and  underestimates  in  component  models  so 
that  the  predictions  of  the  approximate  model  are  guarsinteed  to  be  sound. 

9.1.7  Microscopic  ontologies 

While  much  of  the  research  in  qualitative  reasoning  about  physical  systems  has  fo¬ 
cused  on  macroscopic  theories  of  the  domain,  some  researchers  have  proposed  the  use 
of  both  macroscopic  and  microscopic  domain  theories:  Hayes  gives  the  macroscopic 
contained  stuff  ontology  and  the  microscopic  piece  of  stuff  ontology  for  reasoning 
about  liquids  [Hayes,  1985];  Collins  and  Forbus  develop  a  specialization  of  the  above 
“piece  of  stuff”  ontology,  called  the  molecular  collection  ontology,  and  present  tech¬ 
niques  for  generating  and  reasoning  with  fluids  as  “pieces  of  stuff”  [Collins  and  Forbus, 
1987];  Hajamoney  and  Koo  present  a  qualitative  representation  for  microscopic  theo¬ 
ries  and  describe  a  method  for  obtaining  the  macroscopic  behavior  from  such  theories 
[Rajamoney  and  Koo,  1990]. 

However,  most  of  the  work  in  this  area  has  not  focused  on  the  problem  of  selecting 
an  appropriate  ontology.  A  notable  exception  is  [Liu  and  Farley,  1990],  in  which  they 
present  a  query-driven  method  for  selecting  and  shifting  between  macroscopic  and 
microscopic  domain  theories.  The  selection  and  shift  of  ontologies  is  driven  by  a  set 
of  ontological  choice  rules.  However,  the  generality  and  scope  of  these  rules  is  not 
clear. 

While  we  have  not  actually  developed  any  microscopic  domain  theories,  the  dis¬ 
cussion  in  Section  5.2  applies  in  a  straightforward  manner:  if  the  macroscopic  and 
microscopic  dorndn  theories  are  mutually  consistent,  then  the  techniques  developed 
in  this  thesis  provide  a  general  method  for  selecting  the  appropriate  ontology.  This 
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seems  likely  because  the  macroscopic  and  microscopic  are  usually  used  for  different 
purposes,  e.g.,  in  [Collins  and  Forbus,  1987],  the  maeroscopic  “contained  stuff”  on¬ 
tology  is  used  to  establish  global  properties  like  temperature  amd  pressure  gradients, 
while  the  microscopic  “molecular  collection”  ontology  is  used  to  determine  how  a 
molecular  collection  moves  through  the  system.  Furthermore,  the  “molecular  collec¬ 
tion”  ontology  is  parasitic  (i.e.,  dependent)  upon  the  “contained  stuff”  ontology,  and 
hence  the  two  ontologies  are  mutually  consistent. 


9.2  Logical  approaches 

Logical  approaches  to  dealing  with  multiple  domain  theories  have  been  proposed  by 
Hobbs  and  by  McCarthy.  Hobbs  outlines  a  framework  for  a  theory  of  granularity, 
which  is  a  means  of  constructing  simpler  theories  out  of  more  complex  ones  using  the 
notion  of  indistinguishability  with  respect  to  a  set  of  relevant  predicates  [Hobbs,  1985]. 
He  proposes  the  use  of  a  set  of  articulation  axioms  to  link  the  different  granularities, 
and  to  allow  shifts  of  perspective  during  problem-solving.  McCarthy  introduces  the 
notion  of  context,  which  captures  the  implicit  assumptions  underlying  any  ajcioma- 
tization  [McCarthy,  1987].  He  proposes  that  all  axioms  maJce  assertions  about  some 
context,  and  a  set  of  nonmonotonic  rules  allow  inheritance  to  more  general  and  more 
specific  contexts. 

However,  both  the  above  proposals  lack  detail.  In  his  thesis  [Guha,  199l],  Guha 
works  out  some  of  the  details  of  McCarthy’s  proposal  and  demonstrates  the  use  of 
contexts  in  CYC.*  He  develops  a  syntax,  semantics,  and  proof  theory  of  a  language  for 
expressing  the  contextual  dependence  of  axioms.  He  introduces  lifting  axioms,  which 
allow  a  formula  in  one  context  to  be  converted  into  an  equivalent  formula  in  another 
context.  He  also  introduces  a  set  of  default  lifting  axioms,  which  he  says  takes  care 
of  a  majority  of  the  lifting. 

Guha’s  work  is  certainly  more  ambitious  in  scope  than  ours,  since  it  attempts  to 

*CYC  is  a  large,  multi-domain,  common  sense  knowledge  b2ise,  described  in  [Lenat  and  Guha, 
1990]. 
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deal  with  generaJ  first-order  theories,  rather  than  equation  models  of  physical  sys¬ 
tems.  However,  when  restricted  to  modeling  physical  systems,  his  approach  seems 
very  similcir  to  Falkenhciiner  and  Forbus’s  work  on  compositional  modeling  [F^dken- 
hainer  and  Forbus,  199l].  In  particular,  modeling  is  primarily  driven  by  the  terms 
mentioned  in  the  query  and  the  set  of  lifting  axioms.  Furthermore,  it  is  not  at  all  clear 
that  his  default  lifting  axioms  will  be  of  much  help  in  selecting  appropriate  approx¬ 
imations.  This  is  in  contrast  with  our  work,  where  the  modeling  is  primarily  driven 
by  the  expected  behavior.  Hence,  the  difference  between  our  work  and  compositional 
modeling  can  be  reiterated  here  (specifically  the  first  two  differences). 


Chapter  10 
Conclusions 


In  this  thesis  we  investigated  the  problem  of  automatically  selecting  adequate  models 
for  physical  systems.  We  will  now  present  a  summary  of  the  techniques  developed 
in  this  thesis,  reiterating  the  main  contributions.  We  will  then  suggest  directions  for 
future  work. 

10.1  Summary  and  contributions 

We  formulated  the  problem  cf  selecting  adequate  models  as  a  search  problem,  requir¬ 
ing  answers  to  the  following  three  questions: 

•  What  is  a  model,  and  what  is  the  space  of  possible  models? 

•  What  is  an  adequate  model? 

•  How  do  we  search  the  space  of  possible  models  for  adequate  models? 

We  defined  a  model  as  a  set  of  model  fragments,  where  a  model  fragment  is  a 
set  of  independent  algebraic,  qualitative,  and/or  diflFerential  equations  that  partially 
describes  some  physical  phenomena.  The  space  of  possible  models  was  defined  im¬ 
plicitly  by  the  set  of  applicable  model  fragments:  different  subsets  of  this  set  of 
applicable  model  fragments  correspond  to  different  models.  The  set  of  applicable 
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model  fragments  was  defined  by  (a)  the  structure  of  the  physical  system,  which  spec¬ 
ifies  the  system’s  components;  and  (b)  a  component  library,  which  specifies  the  types 
of  model  fragments  that  can  be  used  to  model  each  type  of  component. 

We  gave  a  clear  definition  of  model  adequacy,  which  was  tuned  to  the  task  of 
generating  parsimonious  causal  explanations.  An  adequate  model  was  defined  as  a 
consistent  and  complete  model  that  could  explain  the  phenomenon  of  interest.  In 
addition,  an  adequate  model  was  required  to  satisfy  any  domain -independent  and 
domain- dependent  constraints  on  the  structure  and  the  behavior  of  the  physical  sys¬ 
tem.  Finally,  an  adequate  model  was  required  to  be  be  as  simple  as  possible,  with 
model  simplicity  being  based  on  the  intuition  that  modeling  fewer  phenomena  more 
approximately  leads  to  simpler  models. 

We  then  developed  a  formal  statement  of  the  problem  of  finding  adequate  models, 
and  showed  that,  in  general,  the  problem  is  intractable  (NP-hard).  We  also  identified 
three  different  sources  of  intractability:  (a)  deciding  what  phenomena  to  model,  i.e., 
deciding  which  assumption  classes  to  select;  (b)  deciding  how  to  model  selected  phe¬ 
nomena,  i.e.,  deciding  which  model  fragment  to  use  from  each  selected  eissumption 
class;  and  (c)  having  to  satisfy  all  the  domain-independent  and  domain-dependent 
constraints.  We  also  showed  that  some  related  problems  are  also  intractable,  e.g.,  the 
problem  of  finding  coherent  models  is  intractable. 

The  intractability  of  the  problem  of  finding  adequate  models  means  that,  in  gen¬ 
eral,  we  can’t  do  much  better  than  search  the  whole  space  of  possible  models.  Un¬ 
fortunately,  even  for  simple  devices,  the  space  of  possible  models  is  extremely  large, 
making  any  sort  of  brute  force  search  completely  impractical.  To  address  this  prob¬ 
lem,  we  introduced  a  set  of  restrictions  on  the  space  of  possible  models,  and  used 
these  restrictions  to  develop  an  efficient  algorithm  for  finding  adequate  models.  The 
most  significant  such  restriction  was  that  all  the  approximation  relations  between 
model  fragments  were  required  to  be  causal  approximations.  However,  this  does  not 
appear  to  be  a  serious  restriction  since  most  of  the  commonly  used  approximations 
are  causal  approximations. 

Our  definition  of  model  adequacy  requires  us  to  generate  the  behavior  of  a  device. 
To  this  end,  we  developed  a  novel  order  of  magnitude  reasoning  technique  which 
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strikes  a  balance  between  purely  qualitative  and  purely  quantitative  methods.  In  this 
technique,  the  order  of  magnitude  of  a  parameter  is  defined  on  a  logarithmic  scale, 
and  a  set  of  rules  are  used  to  propagate  orders  of  magnitudes  through  equations.  A 
novel  feature  of  the  set  of  propagation  rules  is  that  they  effectively  handle  non-linear 
simultaneous  equations,  using  linear  programming  in  conjunction  with  backtracking. 
We  showed  that  order  of  magnitude  reasoning  using  this  technique  is  intractable,  and 
developed  an  approximate  reasoning  scheme  that  works  well  in  practice. 

Finally,  we  described  an  implemented  model  selection  program  based  on  the  above 
techniques.  This  program  includes  a  heuristic  method,  based  on  the  component 
interaction  heuristic,  for  finding  an  initial  causal  model.  The  model  selection  program 
was  tested  on  a  variety  of  electromechanical  devices.  These  tests  provided  empirical 
evidence  for  the  theoretical  claims  made  in  the  rest  of  the  thesis. 

10.2  Future  work 

The  work  described  in  this  thesis  can  be  extended  in  a  number  of  different  ways.  We 
now  discuss  four  specific  directions  for  future  work. 

Expressivity  of  the  expected  behavior 

In  this  thesis  we  represented  the  expected  behavior  as  a  causal  relation  between 
parameters.  While  this  representation  has  proved  to  be  useful,  it  is  clearly  not  very 
expressive.  More  expressive  languages  will  allow  us  to  represent  a  wider  range  of 
expected  behaviors.  For  example,  in  addition  to  causal  relations  between  parameters, 
we  may  want  to  include  information  about  the  relative  directions  of  change  (increasing 
Tt  causes  6p  to  decrease),  we  may  want  to  include  information  about  specific  functional 
relationships  between  parameters  {Tt  and  9p  are  linearly  related),  or  we  may  want 
Icinguages  for  expressing  the  device’s  function  (e.g.,  see  the  papers  on  functional 
reasoning  in  [Chandrasekaran,  1991]). 

While  developing  more  expressive  languages  is  in  itself  not  difficult,  the  real  chal¬ 
lenge  is  to  develop  more  expressive  tractable  languages.  This  is  important  because 
a  central  goal  of  selecting  adequate  models  is  to  aid  effective  problem  solving.  This 
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goal  is  compromised  if  the  model  selection  method  resulting  from  using  an  expressive 
language  for  the  expected  behavior  is  itself  intractable.  Hence,  an  important  direction 
of  future  research  is  the  development  of  more  expressive  languages  for  expressing  the 
expected  behavior  that  still  allow  efficient  model  selection  algorithms. 

Reasoning  about  model  accuracy 

In  comparing  our  work  to  the  work  on  compositional  modeling  (see  Section  9.1.1), 
we  noted  that  expressing  the  expected  behavior  as  a  set  of  declarative  constraints 
is  difficult.  A  similar  comment  applies  to  our  work  with  respect  to  reasoning  about 
model  accuracy. 

Accuracy  is  a  very  important  characteristic  of  adequate  device  models:  a  model 
must  be  sufficiently  accurate  to  be  useful.  In  our  work,  a  model  is  deemed  to  be 
accurate  enough  if  all  the  behavioral  preconditions  and  coherence  constraints  are 
satisfied,  with  the  level  of  accuracy  being  determined  by  the  settings  of  the  thresholds. 
However,  we  do  no  reasoning  about  the  settings  of  these  thresholds:  they  are  part  of 
the  input.  This  places  a  heavy  burden  on  the  user:  it  is  not  easy  to  craft  a  set  of 
behavioral  constraints,  with  appropriately  set  thresholds,  that  ensure  that  adequate 
models  are  accurate  enough,  while  also  ensuring  that  models  are  not  required  to  be 
too  accurate. 

A  much  better  approach  is  to  allow  the  user  to  specify  the  desired  accuracy  of 
the  model  much  more  easily,  e.g.,  by  specifying  tolerances  on  certain  parameters.  We 
then  need  to  develop  techniques  for  finding  models  that  guarantee  that  predictions 
will  lie  within  the  specified  tolerances.  In  Section  9.1.6  we  discussed  some  initial  work 
along  these  lines,  but  much  still  needs  to  be  done. 

Multiple  operating  regions 

Many  devices  go  through  multiple  operating  regions  during  the  course  of  their  normal 
operations.  Different  operating  regions  can  have  different  characteristics,  requiring 
the  use  of  different  models.  In  this  thesis,  multiple  operating  regions  <ire  handled  by 
requiring  a  new  set  of  inputs  for  each  region,  thereby  allowing  us  to  tailor  a  model 
for  each  operating  region.  However,  this  requires  that  the  user  be  aware  of  each  of 
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the  device’s  operating  regions.  While  this  may  be  reasonable  in  the  context  of  design, 
e.g.,  see  [Iwasaki  and  Chandrasekaran,  1992],  it  seems  undesirable  in  most  situations. 

The  abov3  shortcoming  can  be  addressed  as  follows.  First,  behavior  generation 
must  include  some  form  of  simulation,  so  that  the  different  operating  regions  can 
be  discovered  automatically.  While  both  qualitative  and  numerical  simulation  can  be 
used,  an  interesting  alternative  is  to  develop  a  generalization  of  our  order  of  magnitude 
reasoning  technique  that  allows  simulation. 

Second,  we  need  to  develop  techniques  for  inferring  the  expected  behavior  of  each 
operating  region,  given  the  overall  expected  behavior  of  the  device.  For  example, 
the  expected  behavior  of  the  ignition  system  in  an  automobile  can  be  expressed  as 
follows:  “turning  the  ignition  key  causes  the  engine  to  start.”  The  ignition  system 
goes  through  a  series  of  operating  regions  to  achieve  this  expected  behavior  (see 
[Macaulay,  1988]  for  details).  Rather  than  specifying  the  expected  behavior  of  each 
operating  region,  it  is  desirable  to  have  them  automatically  inferred.  We  believe 
that  techniques  for  doing  such  inference  will  be  tightly  integrated  with  the  simulation 
technique  used  to  generate  the  operating  regions. 

Other  tasks 

The  model  selection  techniques  developed  in  this  thesis  have  been  focussed  on  the  task 
of  generating  parsimonious  causal  expleinations.  This  suggests  a  natural  direction  for 
future  work — developing  model  selection  techniques  for  other  tasks.  A  particularly 
promising  task  appears  to  be  diagnosis,  where  there  is  an  emerging  understanding 
of  what  it  means  for  a  model  to  be  adequate  for  diagnosis  [Davis,  1984;  Hamscher, 
1991].  Furthermore,  as  the  discussion  in  Section  9.1.5  suggests,  we  believe  that  the 
techniques  developed  in  this  thesis  will  prove  valuable  in  developing  methods  for 
selecting  adequate  models  for  diagnosis. 


Appendix  A 

Examples  of  causal 
approximations 


In  this  appendix  we  present  a  list  of  commonly  used  approximations  that  can  be  ex¬ 
pressed  as  causal  approximations.  Most  of  these  approximations  have  been  borrowed 
from  the  fitting  approximations  listed  in  (Weld,  1991],  though  most  of  the  actual 
equations  have  been  adapted  from  [Halliday  and  Resnick,  1978]. 

Each  of  the  items  in  this  list  correspond  to  a  single  assumption  class.  We  provide 
a  brief  description  of  the  various  ways  of  modeling  each  phenomena.  The  equations  of 
these  different  model  fragments  are  then  presented  in  a  tabular  form,  with  a  horizontal 
line  separating  the  different  model  fragments.  Model  fragments  lower  in  the  table  are 
approximations  of  model  fragments  higher  in  the  table,  while  model  fragments  at 
the  same  level  are  not  approximations  of  each  other.  It  is  easy  to  verify  that  all  the 
approximations  listed  here  arc  causal  approximations. 


1.  Translational  inertia 

Newton’s  second  law  of  motion  predicts  that  the  acceleration,  a,  of  a  body  of 
mziss,  m,  is  proportional  to  the  net  force,  F,  acting  on  the  body.  It  is  common 
to  approximate  this  law  by  zissuming  that  the  mass,  and  hence  the  net  force,  is 
zero. 
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Newton’s  second  law 

F  =  ma 

No  translational  inertia 

o 

II 

2.  Rotational  inertia 

This  is  similar  to  translational  inertia.  Newton’.s  second  law  of  motion  predicts 
that  the  angular  acceleration,  a,  of  a  body  of  moment  of  inertia,  7,  is  propor¬ 
tional  to  the  net  torque,  r,  acting  on  the  body.  It  is  common  to  approximate 
this  law  by  assuming  that  the  moment  of  inertia,  and  hence  the  net  torque,  is 
zero. 


Nev/ton’s  second  law 

r  =  lot 

No  rotational  inertia 

o 

II 

3.  Relativistic  mass 

Einstein’s  special  theory  of  relativity  predicts  that  the  mass,  m,  of  an  object 
increases  as  its  velocity,  u,  increases.  The  mass  at  zero  velocity  is  called  the  rest 
mass,  mo.  However,  this  effect  is  noticeable  only  at  velocities  approaching  the 
speed  of  light,  c.  At  more  ordinary  velocities,  it  is  common  to  assume  that  the 
mass  is  constant. 


Special  theory  of  relativity 

mo 

m  =  — ===== 
yjl  -  (v/c)2 

Non-relativistic  mass 

exogenous  {m) 

4.  Relativistic  motion 

Let  S  and  S'  be  observers  such  that  S'  is  moving  at  velocity  v  with  respect  to 

S.  Let  S  and  S'  observe  the  same  event.  Let  S  record  the  time  and  position 
of  the  event  as  t  and  x,  and  let  S'  record  the  time  and  position  of  the  event 
as  t'  and  x'.  The  relationship  between  x,  i',  f,  and  t'  is  given  by  the  Lorentz 
transformation.  However,  at  velocities  much  smaller  than  the  speed  of  light,  c, 
it  is  common  to  use  the  simpler  Galilean  transformations. 
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Lorentz  transformation 

,  X  —  vt 

x'  =  - = 

V^l  -  (u/c)2 

t-{vl<P)x 

^1  -  (n/c)2 

Galilean  transformation 

x'  =  X  —  vt 

r  =  t 

5.  Deformable  bodies 

When  elastic  bodies  are  acted  upon  by  a  force,  F,  they  deform  by  an  amount, 
X.  The  deformation  is  proportional  to  the  force  {k  is  the  constant  of  proportion¬ 
ality),  and  the  relationship  between  the  two  is  given  by  Hooke’s  law.  However, 
it  is  common  to  assume  that  bodies  are  rigid,  so  that  there  is  no  deformation 
caused  by  an  applied  force. 

Hooke’s  law  F  =  —kx 
Rigid  bodies  *  =  0 

6.  Friction 

When  two  bodies  move  against  each  other  a  frictional  force,  /,  impedes  the 
motion.  The  frictional  force  is  proportional  to  the  force,  JV,  acting  normal  to 
the  direction  of  motion,  and  the  constant  proportionality  is  called  the  coefficient 
of  fricticii.  However,  when  motion  involves  smooth  surface,  it  is  common  to 
disregard  the  frictional  force. 


Motion  with  friction 

f  =  gN 

Frictionless  motion 

/  =  o 

7.  Gravitational  fields 

Newton’s  law  of  gravitation  predicts  that  the  acceleration  due  to  gravity,  g,  at  a 
distance  r  from  an  object  of  mass  M  is  proportional  to  the  mass  and  is  inversely 
proportional  to  the  square  of  the  distance  (the  constant  of  proportionality  is 
the  Gravitational  constant,  G).  When  the  variation  in  r  is  small  compared  the 
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magnitude  of  r,  it  is  common  to  assume  that  the  acceleration  due  to  gravity 
is  essentially  constant.  This  can  be  further  approximated,  when  r  becomes 
sufficiently  large,  by  assuming  that  the  acceleration  due  to  gravity  is  essentially 
zero. 


Newton’s  law  of  gravity 

9  =  GM/r^ 

Constant  gravity 

txogtnous{g) 

Zero  gravity 

<7  =  0 

8.  Collisions 

Collisions  between  objects  axe  typically  inelastic.  If  an  object  approaches  a 
stationary  wall  at  velocity  Vi,  then  the  velocity  after  the  collision  u/  is  attenuated 
by  the  coefficient  of  restitution,  oc.  This  is  often  approximated  by  assuming 
that  the  collision  is  elastic,  so  that  the  initial  and  final  velocities  are  equal  in 
magnitude. 


Inelastic  collision 

Vj  =  —aVi 

Elastic  collision 

Vf  =  -Vi 

9.  Gas  laws 

The  ideal  gas  law  provides  a  relationship  between  the  pressure,  P,  the  volume, 
y,  zmd  the  temperature,  T,  of  a  mole  of  gas.  A  more  accurate  gas  law  is  the  Van 
der  Waials  equation  of  state,  that  accounts  for  the  non-zero  size  of  gas  molecules, 
and  that  gas  molecules  repel  each  other  at  short  distances.  In  these  equations, 
R  is  the  universal  gas  constcint,  and  a  and  b  are  experimental  constants. 


Van  der  Waals  gas 

Ideal  gas  law 

PV  =  RT 

10.  Thermal  conduction 

The  rate  of  heat  flow,  /,  across  a  thermal  conductor  is  proportional  to  the 
difference  in  temperature  at  the  two  ends  of  the  conductor  {T\  and  are  the  two 
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temperal-ures).  The  constant  of  proportionality  is  the  thermal  conductance,  7. 
There  are  two  different  ways  of  approximating  this  model.  First,  we  can  assume 
that  the  conductor  is  an  ideal  thermal  insulator,  so  that  there  is  no  heat  flow. 
Second,  we  can  assume  that  the  conductor  is  an  ideal  thermal  conductor,  so 
that  there  is  never  a  difference  between  the  two  temperatures. 


Thermal  conduction 

/  =  7(r,-r.) 

Ideal  thermal  insulator 

II 

0 

Ideal  thermal  conductor 

II 

11.  Thermal  conductance 

The  thermal  conductance,  7,  of  a  thermal  conductor  is  dependent  on  the  length, 
/,  the  cross-sectional  area.  A,  and  the  thermal  conductivity,  k,  of  the  conductor. 
When  the  dependence  of  7  on  these  factors  is  unnecessary,  one  can  merely 
assume  that  it  is  constant. 


Dependent  thermal  conductance 

7  =  kA/l 

Constant  thermal  conductance 

exogenous{'y) 

12.  Electrical  conduction 

The  current  flow,  f,  across  an  electrical  conductor  is  proportional  to  the  voltage 
drop,  V,  across  the  conductor.  The  constant  of  proportionality  is  the  resis- 
tcince,  R,  and  the  relationship  is  Ohm’s  law.  There  are  two  different  ways  of 
approximating  this  model.  First,  we  can  assume  that  the  conductor  is  ein  ideal 
electrical  insulator,  so  that  there  is  no  current  flow.  Second,  we  can  assume 
that  the  conductor  is  an  ideal  electrical  conductor,  so  that  the  voltage  drop  is 
always  zero. 


Ohm’s  law 

V  =  iR 

Ideal  electrical  insulator 

i  =  0 

Ideal  electrical  conductor 

V  =  0 

13.  Electrical  resistance 

The  electrical  resistance,  R,  of  an  electrical  conductor  is  dependent  on  the 
length,  /,  the  cross-sectional  area.  A,  and  the  resistivity,  p,  of  the  conductor. 
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When  the  dependence  of  R  on  these  factors  is  unnecessary,  one  can  merely 
assume  that  it  is  constant. 


Dependent  resistance 

R=  pi  f  A 

Constant  resistance 

exogenous{R) 

14.  Resistivity 

The  resistivity,  p,  of  an  electrical  conductor  is  a  function  of  the  temperature,  T, 
of  the  conductor,  po  is  the  resistivity  at  temperature  Tq,  and  a  is  the  coefficient 
of  resistivity.  However,  this  dep«idence  is  often  neglected,  and  the  resistivity 
is  assumed  to  be  constant. 


Temperature  dependent  resistivity 

P  —  />o(l  +  oi{T  —  To)) 

Constant  resistivity 

exogenous{p) 

15.  Heat  engine 

A  heat  engine  can  be  thought  of  as  a  cyclic  process  that  extracts  heat  from  a 
high  temperature  source,  converts  part  of  this  heat  into  work,  and  discharges 
the  rest  of  the  heat  to  a  low  temperature  sink.  The  efficiency,  e,  of  a  heat  engine 
is  the  fraction  of  extracted  heat  that  is  converted  into  work.  Carnot  showed  that 
the  efficiency  of  an  ideal  heat  engine  is  a  function  of  the  source  temperature, 
Ti,  and  sink  temperature,  T2,  and  that  the  efficiency  of  a  real  heat  engine  is  less 
than  or  equal  to  the  ideal  efficiency  by  an  efficiency  factor,  7. 


Real  heat  engine 

e  =  j{l-T2/Ti) 

■r 

Ideal  heat  engine 

e  =  (I  -  nm 

16.  Laminar  flow  in  horizontal  pipes 

The  rate,  V ,  of  laminar  flow  of  a  fluid  in  a  pipe  is  a  proportional  to  the  difference 
between  the  pressure  at  one  end  of  the  pipe.  Pi ,  and  the  pressure  at  the  other  end 
of  the  pipe,  P^.  The  pressure  drop  in  the  pipe  is  due  to  the  viscous  resistance, 
R,  of  the  fluid.  This  model  is  often  approximated  to  disregard  the  viscous 
resistance,  so  that  there  is  no  pressure  drop  across  the  pipe. 
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Viscous  flow 

P,  -  P,  =  RV 

Inviscid  flow 

17.  Thermal  expansion 

When  objects  are  heated,  they  expand.  The  amount  of  expansion,  6,  is  a 
function  of  the  object’s  temperature,  T,  and  the  coefficient  of  thermcil  expansion, 
a.  8  is  assumed  to  be  zero  when  the  size  of  the  object  is  Iq  at  temperature  To. 
This  expansion  is  often  quite  small,  and  can  be  disregarded  for  many  purposes. 


Thermal  expansion 

6  =  q/o(T  -  To) 

No  thermal  expaiision 

6  =  0 

18.  Exogenizing  and  equilibrating  differential  equations 

Chapter  6  shows  that  exogenizing  and  equilibrating  differential  equations  can 
be  considered  to  be  causal  approximations.  For  example,  the  rate  of  change 
of  the  temperature,  T,  of  an  object  is  a  function  of  the  net  heat,  F,  flowing 
into  the  object  and  the  object’s  heat  capacity,  C.  This  differential  equation 
can  be  exogenized  by  assuming  that  the  the  temperature  is  constant.  It  can  be 
equilibrated  by  assuming  that  the  temperature  quickly  adjusts  itself  to  ensure 
that  the  net  heat  flow  is  zero. 


Dynamic  thermal  model 

O 

II 

Constant  temperature 

txogtnous{T) 

Equilibrium  temperature 

P  =  0 

It  is  interesting  to  ask  whether  there  are  commonly  used  approximations  that  are 
not  causal  approximations.  We  have  identified  such  a  class  of  such  approximations 
that  do  not  exactly  fit  our  definition  of  causal  approximations,  but  are  close.  The 
problem  is  that  the  more  approximate  model  fragments  contain  parameters  not  found 
in  the  more  accurate  model  fragments.  Here  are  some  examples; 


1.  Viscosity  of  gases 
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The  viscosity,  //,  of  a  gas  is  a  function  of  its  temperature,  T,  and  mass,  m, 
(equivalently,  its  molecular  weight,  M).  There  are  at  least  two  models  of  this 
dependence.  An  approximate  model  assumes  that  the  gas  molecules  are  hard 
balls  of  diameter  d.  A  more  accurate  model  models  the  gas  molecule  as  a  force 
field,  and  uses  the  Lennard  Jones  potentizil  energy  function.  These  models  have 
been  taken  from  [Welty  et  a/.,  1984]. 


Force  field  model 

=  2.6693  X 

Rigid  sphere  model 

2  yJrrxKT 
^  ~  3x3/2  ^ 

Note  that  the  force  field  model  does  not  contain  parameters  like  d  that  axe 
found  in  the  rigid  sphere  model. 

2.  Linearizations 

Complicated  equations  are  often  approximated  by  linearizing  them.  Such  lin¬ 
earizations  introduce  additional  parameters,  such  as  the  slope  of  the  line.  These 
parameters  are  clearly  not  part  of  the  original  equation. 

While  the  above  two  approximations  do  not  fit  our  definition  of  a  causal  approx¬ 
imation,  one  can  see  that  they  almost  do.  In  particular,  in  the  first  caise,  if  we  are 
only  interested  in  the  dependence  of  on  T,  then  the  approximation  behaves  like  a 
causal  approximation.  Similarly,  in  the  second  case,  if  we  axe  not  interested  in  the 
dependence  of  any  parameter  on  the  additional  paraimeters  like  the  slope,  then  the 
linearization  behaves  like  a  causal  approximation. 

Hence,  we  can  generalize  our  definition  of  causal  approximations  by  allowing  more 
approximate  model  fragments  to  have  parameters  not  in  the  more  accurate  model 
fragment,  but  add  the  restriction  that  such  parameters  be  local  to  the  more  approx¬ 
imate  model  fragment.  Furthermore,  we  must  also  require  that  we  are  not  interested 
in  the  dependence  of  any  paxameter  on  such  parameters.  Both  these  restrictions  seem 
reasonable  in  the  above  cases. 


Appendix  B 
Example  devices 


in  this  appendix  we  describe  the  electromechanical  devices  that  our  model  selection 
program  was  tested  on. 


B.l  Bimetallic  strip  temperature  gauge 


The  bimetallic  gtrip  temperature  gauge  is  shown  in  Figure  B.l.  It  is  based  on  a 
similar  temperature  gauge  descril^ed  in  [Macaulay,  1988,  page  290].  A  thermistor  is  a 
semi-conductor  device,  »  small  increase  in  its  temperature  causes  a  large  decrease  in 
its  resistance.  A  bimetallic  strip  consists  of  tw*  rips  made  of  different  metals  that 
are  joined  together.  Temperature  change®  cause  the  two  strips  to  expand  by  different 
amounts,  causing  the  bimetallic  strip  to  bend. 

It  works  as  follows:  the  themistor  mses  ti  c  water  temperature.  The  thermistor’s 
temperature  determines  its  rcsistaiice,  vi-  ;?}  Oj’^termines  the  current  flowing  in  the 
circuit.  This  determines  the  amount  of  heat  dissipated  in  the  wire,  which  determines 
the  temperature  of  the  bimetallic  strip.  The  temperature  of  the  bimetallic  strip 
determines  its  deflection,  which  determines  the  position  of  the  pointer  along  the  scale. 
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« 


Container  of  water 


Figure  B.l:  Bimetallic  strip  temperature  gauge 

B.2  Bimetallic  strip  thermostat 

The  bimetallic  strip  thermostat  is  shown  in  Figure  B.2.  It  is  based  on  thermostats 
described  in  [Macaulay,  1988,  page  162].  Its  operation  is  very  straightforwaxd:  the 
bimetallic  strip  senses  the  temperature  of  its  emironment.  The  bimetallic  strip’s 
temperature  determines  its  deflection.  If  the  temperature  is  too  high,  the  bimetallic 
strip  bends  enough  to  lose  connection  with  the  contact,  which  breaJcs  the  electrical 
circuit  and  turns  the  heater  off.  Otherw^e,  the  bimetallic  strip  does  not  bend  enough 
to  lose  its  connection  with  the  contact,  so  that  the  electrical  circuit  is  completed,  and 
the  heater  is  turned  on. 
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B.3  Flexible  link  temperature  gauge 


The  flexible  link  temperature  gauge  is  shown  in  Figure  B.3.  It  is  adapted  from  a 
flexible  link  resistance  box  mechanism  described  in  [Artobolevsky,  1980,  page  45l]. 
Its  operation  is  based  on  the  principle  that  the  length  of  a  wire  is  dependent  on  its 
temperature. 


Pointer 


Figure  B.3:  Flexible  link  temperature  gauge 


It  works  as  follows:  the  thermistor  senses  the  water  temperature.  The  thermistor’s 
temperature  determines  its  resistance,  which  determines  the  current  flowing  in  the 
circuit.  The  current  in  the  circuit  determines  the  amount  of  heat  generated  in  the 
wire,  which  determines  the  wire’s  temperature.  The  wire’s  temperature  determines 
the  wire’s  length.  Since  the  wire  is  fixed  at  its  two  ends,  the  length  of  the  wire 
determines  the  deflection  of  the  spring,  and  hence  the  position  of  the  pointer  along 
the  scale. 


B.4.  GALVANOMETER  TEMPERATURE  GAUGE 
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B.4  Galvanometer  temperature  gauge 

The  galvanometer  temperature  gauge  is  shown  in  Figure  B.4.  The  galvanometer 
shown  in  this  figure  has  been  adapted  from  [Halliday  and  Resnick,  1978,  page  726]. 
Its  operation  is  bzised  on  the  interaction  between  the  field  of  a  permanent  magnet 
and  the  magnetic  field  generated  by  an  electric  current. 


Pointer 


Thermistor 


Battery 


Container  of  water 


Iron  Core 
Wire 


Figure  B.4:  Galvanometer  temperature  gauge 


It  works  as  follows:  the  thermistor  senses  the  water  temperature.  The  thermistor’s 
temperature  determines  its  resistance,  which  determines  the  current  flowing  in  the 
circuit.  The  current  flowing  in  the  wire  generates  a  magnetic  field,  which  is  magnified 
by  the  iron  core.  This  magnetic  field  interacts  with  the  magnetic  field  generated  by 
the  permanent  magnet,  producing  a  torque  on  the  iron  core.  This  causes  the  iron 
core,  and  hence  the  pointer,  to  deflect.  The  spring  provides  a  restoring  torque,  so 
that  the  deflection  of  the  pointer  is  proportional  to  the  strength  of  the  torque. 
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B.5  Electric  bell 


The  electric  bell  is  shown  in  Figure  B.5.  It  has  been  adapted  from  [Artobolevsky, 
1980,  page  129].  It  works  as  follows:  when  the  switch  is  pressed,  the  electric  circuit  is 
dosed.  This  energizes  the  winding  of  the  electromagnet,  attracting  the  armature  so 
that  the  clapper  can  strike  the  bell.  This  also  breaks  the  circuit  at  the  contact,  and 
hence  deactivates  the  electromagnet.  The  armature  returns  to  its  original  position 
due  to  the  flat  spring,  closing  the  circuit  and  repeating  the  above  cycle. 


Qapper 

Bell 


B.6  Magnetic  sizing  device 

The  magnetic  sizing  device  is  shown  in  Figure  B.6.  It  has  been  adapted  from  [Arto¬ 
bolevsky,  1980,  page  57|.  It  is  used  to  ensure  that  the  size  of  the  workpiece  is  within 
desired  limits. 


B.  7.  CARBON  PILE  REGULATOR 
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It  works  as  follows:  the  position  of  the  measuring  spindle  is  determined  by  the  size 
of  the  workpiece.  This  determines  the  position  of  the  armature  with  respect  to  the 
two  cores,  and  hence  determines  the  width  of  the  four  mr  gaps  between  the  ends  of 
the  armature  and  the  ends  of  the  cores.  The  current  flowing  in  the  circuit  generates 
a  magnetic  flux  in  the  cores  and  the  air  gaps.  The  strength  of  the  magnetic  flux  is 
determined  by  the  width  of  the  air  gaps.  Hence,  the  strength  of  the  magnetic  flux 
can  be  used  as  a  measure  of  the  size  of  the  workpiece. 


B.7  Carbon  pile  regulator 

The  carbon  pile  regulator  is  shown  in  Figure  B.7.  This  device  is  adapted  from  [Arto- 
bolevsky,  1980,  page  108].  Its  operation  is  based  on  the  fact  that  the  resistance  of  a 
carbon  pile  is  dependent  on  the  compressive  force  acting  on  it. 

The  carbon  pile  regulator  works  as  follows:  the  position  of  the  rheostat  determines 
the  resistance  of  the  top  circuit.  This  determines  the  current  flowing  in  that  circuit, 
which  determines  the  strength  of  the  magnetic  field  generated  by  the  electromagnet. 
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Carbon  pile 


The  electromagnet  attracts  the  armature,  with  the  leaf  spring  providing  a  restoring 
force.  The  force  on  the  armature  decreases  the  compressive  force  on  the  carbon  pile, 
thereby  changing  its  resistance.  Hence,  the  resistance  at  the  output  leads  is  a  function 
of  the  position  of  the  rheostat. 

B.8  Electromagnetic  relay  thermostat 

The  electromagnetic  relay  thermostat  is  shown  in  Figure  B.8.  This  device  is  similar 
to  the  bimetallic  strip  thermostat.  The  primary  difference  is  that,  in  this  device,  the 
bimetallic  strip  turns  on  an  electromagnetic  relay  which  turns  on  the  heater,  rather 
than  the  bimetallic  strip  turning  on  the  relay  directly. 

The  operation  of  the  electromagnetic  relay  thermostat  is  similar  to  the  bimetallic 
strip  thermostat.  When  the  environment  becomes  too  cold,  the  bimetallic  strip  de¬ 
flects  less  and  hence  closes  the  contact.  The  resulting  current  flow  creates  a  magnetic 
field  in  the  winding  around  the  core,  which  attracts  the  armature.  The  armature 
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Figure  B.8:  Electromagnetic  relay  thermostat 

moves  downwards  and  closes  the  contacts,  turning  on  the  heater.  When  the  room 
is  too  hot,  the  bimetallic  strip  breaks  its  connection  with  the  contact,  and  current 
stops  flowing  in  the  lower  circuit.  This  causes  the  armature  to  return  to  its  original 
position  due  to  the  restoring  spring,  thereby  turning  the  heater  off. 


B.9  Tachometer 

The  tachometer  is  shown  in  Figure  B.9.  This  device  has  been  adapted  from  [Arto- 
bolevsky,  1980,  page  90].  The  tachometer  measures  the  angular  velocity  of  the  core 
of  tne  windings  at  the  top  of  the  figure. 

The  tachometer  consists  of  a  generator  and  a  galvanometer.  The  generator  consists 
of  the  windings  and  permanent  magnet  at  the  top  of  the  figure,  while  the  galvanometer 
consists  of  the  windings,  permanent  magnet,  ^lnd  pointer  assembly  at  the  bottom 
of  the  figure.  The  tachometer  works  as  follows:  the  rotation  of  the  windings  of  the 
generator  within  the  magnetic  field  of  the  permanent  magnet  induces  an  electromotive 
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force.  This  induced  electromotive  force  drives  current  in  the  circuit,  generating  an 
electromagnetic  field  in  the  windings  of  the  galvanometer.  This  electromagnetic  field 
interacts  with  the  magnetic  field  of  the  permanent  magnet,  causing  the  pointer  to 
deflect  along  the  scale  (the  restoring  spring  is  not  shown). 


B.IO  Car  distributor  system 

A  schematic  of  the  car  distributor  system  is  shown  in  Figure  B.IO.  This  .system 
has  been  adapted  from  [van  Amerongen,  1967,  pages  482-483].  The  function  of  the 
distributor  system  in  a  car  is  to  ensure  that  the  spark  plugs  fire  at  appropriate  times. 

The  distributor  system  works  as  follows:  as  the  cam  rotates,  it  opens  the  contact 
brealcer.  This  causes  the  current  in  ‘.he  primary  windings  to  drop  rapidly  (the  con¬ 
denser  prevents  a  spark  from  jumping  across  the  contact  breaker).  The  rapid  change 
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Primary 

winding 


Figure  B.IO:  Car  distributor  system 


in  current  in  the  primary  winding  causes  a  large  induced  electron  .otive  force  in  the 
secondary  winding.  At  the  same  time,  the  distributor  rotor  connects  the  secondary 
winding  to  one  of  the  spark  plugs  (the  rightmost  spark  plug  in  the  figure).  The  large 
induced  electromotive  force  causes  a  spark  to  jump  across  the  spark  plug. 


Appendix  C 

Composable  operators 

In  this  appendix  we  describe  the  composable  operators  shown  in  Table  2.1.  All  the 
composable  operators  share  two  characteristics:  (a)  they  are  all  converted  into  an 
equivalent  set  of  algebraic  equations;  and  (b)  they  all  use  a  form  of  the  closed  world 
assumpt'  In  particular,  a  set  C  of  compositional  operators  is  converted  into  an 
equivalent  s  E  of  algebraic  equations,  using  the  closed  world  assumption  that  the 
only  elements  in  C  are  the  ones  that  are  known  to  be  in  C. 


C.l  Influences 

The  /+  and  I—  operators  are  the  same  as  in  [Forbus,  1984].  /+(9i,  92)  states  that  is 
a  positive  influence  on  91,  while  /— (91,^2)  states  that  qi  is  a  negative  influence  on  q\. 
Given  a  set  of  /+  and  I—  statements,  we  can  collect  together  all  the  positive  and  neg¬ 
ative  influences  on  each  parameter.  For  each  parameter  we  do  the  following:  (a)  form 
the  term  resulting  from  the  difference  of  the  sum  of  the  all  the  po:  tive  influences  and 
the  sum  of  all  the  negative  influences;  and  (b)  make  the  derivative  of  the  paxameter 
be  equal  to  this  term.  For  example,  from  the  set  {/+(?!, 93), 93), /-f (91,94)} 
we  would  produce  the  equvation: 


dqjdt  =  9s  -  93  +  94 
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Note  how  we  controlled  the  parameters  that  could  be  determined  by  the  resulting 
equation.  However,  the  left  hand  side  parameter  (dqifdt)  can  always  be  determined 
by  the  equation. 

The  sum-ttrm  operator  is  exactly  like  the  /+  operator,  except  that  step  (b)  above 
is  modified  by  making  the  parameter  itself  equal  to  the  constructed  term.  For  exam¬ 
ple,  from  the  set  {suTn-term{qi,q,),sum-term[q.i^,—q^,sum-term{q^,qi)}  we  would 
produce  the  equation: 

~  93  ~  Qa  A"  q4 


C.2  KirchhofF’s  laws 

The  suTK-to-zero,  same-value,  same-reference,  and  same-circuit  operators  are  primar¬ 
ily  used  to  implement  Kirchhoff’s  laws,  though  they  can  be  used  for  other  purposes 
also.  Kirchhoff’s  laws  are  necessary  when  modeling  generalized  flows  in  networks.  We 
will  discuss  these  laws  in  the  context  of  electrical  circuits,  but  the  discussion  is  equally 
applicable  to  other  types  of  circuits,  including  fluid,  thermal,  and  magnetic  circuits. 
We  will  also  restrict  our  discussion  to  electrical  components  with  two  termineils;  the 
generalization  to  components  with  three  or  more  terminals  is  straightforwaurd. 

An  electrical  network  is  formed  by  connecting  terminals  of  electrical  components. 
Each  component  terminal  has  two  important  attributes:  (a)  the  voltage  of  the  termi¬ 
nal;  and  (b)  the  current  flow  into  the  component  at  that  terminal.  Hence,  a  network 
with  n  nodes  and  e  edges  has  2e  currents  and  2e  voltages.  Hence,  4e  independent 
equations  are  needed  to  determine  all  the  currents  and  voltages.  Kirchhoff’s  laws  and 
network  theory  tell  us  the  source  of  these  4e  independent  equations: 

1.  Each  component  in  the  network  heis  a  component  equation.  For  example.  Ohm’s 
law  for  a  resistor  is  a  component  equation.  Since  each  component  is  an  edge  in 
the  network,  there  are  e  independent  component  equations. 

2.  Kirchhoff’s  current  law  tells  us  that  the  net  current  flow  into  each  component  is 
zero.  Hence,  the  sum  of  the  two  currents  flowing  into  each  component  is  zero. 
There  are  e  such  independent  equations. 


276 


APPENDIX  a  COMPOSABLE  OPERATORS 


3.  KirchhofF’s  current  law  tells  us  that  the  net  current  flowing  into  any  node  in 
the  network  is  zero.  Hence,  there  are  n  equations  stating  KirchhoflF’s  current 
law  for  each  of  the  n  nodes  in  the  network.  However,  network  theory  tells  us 
that  for  a  connected  network  with  n  nodes,  only  (w  —  1)  of  these  equations  axe 
independent  equations  (though  any  (n  —  1)  of  these  equations  will  do). 

4.  KirchhofF’s  voltage  law  tells  us  that  the  voltages  of  connected  terminals  axe 
equal.  Hence,  at  a  node  where  k  terminals  are  connected,  there  are  k  —  1 
independent  equations  stating  the  equality  of  the  k  voltages.  One  can  verify 
that  there  are  a  total  of  2e  —  n  such  independent  equations. 

5.  The  terminal  voltages  are  measured  with  respect  to  a  reference  voltage.  Hence, 
for  a  connected  network  with  n  nodes,  any  one  voltage  can  be  axbitraxily  selected 
as  the  reference  voltage,  and  this  voltage  can  be  set  to  zero.  This  provides  one 
equation. 

Adding  up  all  the  equations,  we  see  that  there  are  4e  independent  equations  that  can 
be  used  to  determine  values  for  the  4e  voltages  and  currents. 

One  can  easily  associate  the  2e  equations  under  points  1  and  2  with  the  corre¬ 
sponding  components,  and  hence  those  equations  are  easy  to  generate. 

Given  a  network,  one  can  write  a  program  to  generate  the  remaining  2e  indepen¬ 
dent  equations.  Indeed,  we  started  by  doing  precisely  that.  However,  the  program 
turned  out  to  be  very  difficult  to  understand,  maintain,  and  generalize.  In  paxticu- 
lar,  we  needed  the  following  two  generalizations:  (a)  since  our  initial  implementation 
dealt  only  with  electrical  circuits,  we  wanted  to  generalize  it  to  other  types  of  cir¬ 
cuits;  and  (b)  we  wanted  to  handle  disconnected  networks.  Disconnected  networks 
need  special  handling.  Each  connected  component  needs  its  own  reference  voltage, 
and  each  connected  component  has  one  dependent  KirchhoAF’s  current  law  equation. 

To  facilitate  these  generalizations,  we  discarded  the  above  procedural  method  of 
generating  the  2e  equations,  and  instead  developed  a  more  declarative  method.  This 
method  is  based  on  the  use  of  the  same-value,  same-reference,  sum-to-zero,  and 
same-circuit  operators. 
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The  same-value  operator  is  a  binary  operator  that  states  that  its  two  arguments 
are  equal.  Given  a  set  of  same-value  statements,  it  is  easy  to  partition  the  set  of 
parameters  into  equivalence  classes  such  that  two  pzirameters  are  in  the  same  equiv¬ 
alence  class  if  and  only  if  the  set  of  same-value  constraints  require  them  to  be  equaJ. 
Then,  for  an  equivalence  class  with  k  parameteis,  one  can  then  generate  (Ik  —  1)  inde¬ 
pendent  equality  constraints  that  ensure  that  all  the  parameters  in  that  assumption 
class  are  equal. 

We  use  the  same-value  operator  to  generate  the  (2e  -  n)  equality  constrciints 
discussed  in  point  4  above.  In  particular,  if  ti  and  <2  are  any  two  connected  elec¬ 
trical  terminals,  then  we  assert  same-value fvoltage(ti },  voltage(t2)),  i.e.,  we  have  the 
following  rule: 

connected-to(i\  ,12)  =►  same-value  (voltage  (i\),  voltage(i2)) 

This  allows  us  to  generate  the  required  equality  constraints. 

The  same-reference  operator  is  a  binary  operator  that  states  that  its  two  zurgu- 
ments  share  a  common  reference  voltage.  Given  a  set  of  same-reference  statements,  11 
is  easy  to  partition  the  set  of  voltages  into  equivalence  classes  such  that  two  voltages 
are  in  the  same  equivalence  class  if  and  only  if  they  share  the  same  voltage.  It  is  then 
easy  to  pick  an  arbitrary  member  from  each  such  equivalence  class  zind  set  it  to  zero. 

We  use  the  same-reference  operator  to  generate  the  equation  discussed  under 
point  5  above.  We  enforce  this  by  ensuring  that  voltages  of  connected  terminals  have 
the  same  reference,  and  voltages  of  terminals  belonging  to  the  same  component  have 
the  same  reference.  This  is  equivalent  to  the  following  rules: 

connected-to(t\,t2)  =>  same-reference(voltage(ti),  voltage(t2)) 
same-component(ti,t2)  =>  same-reference(voltage(t\) ,  voltage(t2)) 

This  ensure*^  that  every  connected  network  has  a  single  reference  voltage. 

The  advantage  of  the  declarative  method  can  be  clearly  demonstrated  with  a 
slight  variation  of  the  second  rule  above.  In  particulM,  if  ti  and  <2  are  terminals  of 
an  electrical  switch  that  is  off,  then  it  is  incorrect  to  say  that  their  corresponding 
voltages  necessarily  have  the  same  reference.  Using  our  declarative  method,  it  is  easy 
to  modify  the  above  rule  to  exclude  this  case. 
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The  sum-to-zero  operator  is  a  binary  operator  that  identifies  peiraxneters  that  will 
be  terms  in  a  sum  that  will  be  zero.  Like  same-value  and  same-reference,  sum-to- 
zero  is  an  equivalence  relation.  Hence,  given  a  set  of  sum-to-zero  statements,  we  can 
partition  the  set  of  parameters  into  a  set  of  equivalence  classes.  The  sum-to-zero 
operator  is  interpreted  as  stating  that  the  parameters  in  each  equivalence  class  sum 
up  to  zero. 

We  use  the  sum-to-zero  operat  or  generate  the  n  equations  under  point  3  above. ^ 
We  do  this  by  stating  that  the  currents  corresponding  to  connected  terminals  sum  to 
zero.  In  particular,  we  have  the  following  rule; 

connect ed-to (ti, t^)  ^  sum-to-zero(current(t\),  currentft^)) 

This  ensures  that  the  currents  flowing  out  of  every  node  sum  to  zero,  as  required  by 
Kirchhoff’s  current  law. 

Finally,  same-circuit  is  a  binary  relation  on  currents  that  states  that  its  two 
arguments  are  part  of  the  same  circuit.  Given  a  set  of  same-circuit  statements, 
we  can  partition  the  set  of  currents  into  a  set  of  equivalence  classes  such  that  two 
currents  are  in  the  same  equivalence  class  if  and  only  if  they  ase  part  of  the  same 
circuit.  Currents  are  in  the  same  circuit  if  (a)  they  belong  to  connected  terminals; 
or  (b)  they  belong  to  terminals  of  the  same  component.  In  particular,  we  have  the 
following  rules: 

connected-to(ti,t2)  =>  same-referencefcurrentfti),  current(t2)) 
same-component(ti,t2)  same-reference(current(t\),  current(t2)) 

The  identification  of  currents  that  are  in  the  same  circuit  allows  us  to  generate 
exactly  (n  —  1)  independent  equations,  as  required  by  point  3  above.  In  peirticular, 
the  sum-to-zero  equivalence  classes  generated  above,  are  further  clustered  into  classes, 
such  that  two  sum-to-zero  equivalence  classes  are  in  the  same  class  if  and  only  if  a 
parameter  in  one  sum-to-zero  equivalence  class  is  in  the  same  circuit  as  a  parameter 
in  the  other  sum-to-zero  equivalence  class.  The  sum-to-zero  equivalence  classes  in  the 

'The  fact  that  only  (n  —  1)  of  these  equations  will  be  independent  is  handled  when  we  discuss 
same-circuii . 


C.3.  OTHER  METHODS 


279 


same  cla^s  correspond  to  the  n  equations  of  a  connected  network  discussed  in  point  3 
above.  To  get  the  (n  —  1)  independent  equations,  we  merely  discard  any  one  of  the 
sum-to-zero  equivalence  classes  from  each  cleiss,  and  use  the  remaining  sum-to-zero 
equivalence  classes  as  above. 

C.3  Other  methods 

It  is  worth  noting  that  the  same-value  and  sum-to-zero  operators  are  necessary  only 
because  of  our  representation  of  the  connected-to  relation,  i.e.,  because  connected- 
to  directly  relates  terminals  of  components.  A  different  representation  would  make 
same-value  and  sum-to-zero  unnecessary.  In  particular,  we  can  (a)  introduce  new 
entities  called  nodes,  one  for  each  node  in  the  network;  and  (b)  require  that  terminals 
can  only  be  connected  to  nodes,  and  not  to  other  terminals.  (This  representation  has 
been  used  by  others,  e.g.,  [Williams,  1989].) 

Using  this  representation,  we  can  associate  a  voltage  with  each  node,  and  a  net 
current  flow  into  each  node.  We  can  easily  make  the  voltages  of  all  termineils  connected 
to  a  node  equal  to  the  node  voltage,  and  we  can  use  the  sum-term  operator  to  state 
that  the  net  current  flow  into  a  node  is  equal  to  the  sum  of  the  current  in  the  terminals 
connected  to  that  node.  We  can  further  state  that  the  net  current  flow  into  a  node 
is  zero. 

While  the  above  representation  removes  the  need  for  same-value  and  sum-to-zero, 
it  does  not  remove  the  need  for  same-reference  amd  same-circuit.  These  latter  two 
operators  are  still  necessary  to  ensure  that  exactly  one  reference  voltage  is  selected  for 
every  connected  network,  and  an  independent  set  of  Kirchhoff’s  current  law  equations 
are  generated. 
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Matchings  in  bipartite  graphs 


In  this  appendix  we  discuss  an  algorithm,  based  on  network  flow  techniques,  for 
finding  maximum  matchings  in  bipartite  graphs.  The  relationship  of  this  algorithm 
to  network  flow  is  discussed  in  textbooks  such  as  [Cormen  et  a/.,  1990;  Even,  1979]. 
We  present  the  algorithm  here  since  some  of  the  proofs  in  Chapter  6  depend  on  an 
understanding  of  it. 

Definition  D.l  (Bipartite  graph)  A  bipartite  graph  G  =  {X,Y,E),  where  X  \JY 
is  the  set  of  nodes  and  E  is  the  set  of  edges,  is  a  graph  whose  nodes  can  be  partitioned 
into  two  sets,  X  and  Y,  such  that  all  edges  in  E  connect  a  node  in  X  to  a  node  in 
Y. 

Definition  D.2  (Matching)  A  matching  in  a  bipartite  graph  is  a  subset  of  the  edges 
of  the  graph  such  that  no  two  edges  in  the  matching  share  a  common  node.  A  maxi¬ 
mum  matching  in  a  bipartite  graph  is  a  matching  of  maximum  cardinality. 

Definition  D.3  (Augmenting  path)  Let  G  =  {X,Y,E)  be  a  bipartite  graph,  and 
let  U  Q  E  be  a  matching.  An  augmenting  path  in  G,  with  respect  to  U,  is  a  sequence 
of  nodes  such  that: 

1.  no  node  is  repeated  in  the  sequence; 

2.  alternate  nodes  in  the  sequence  are  in  X,  with  the  first  node  being  in  X; 
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3.  alternate  nodes  in  the  sequence  are  in  Y,  with  the  last  node  being  in  Y; 

4-  if  X  e  X  and  y  eY  are  consecutive  nodes  in  the  sequence,  with  x  before  y,  then 
there  is  an  edge  e  €  £  such  that  e  is  incident  on  both  x  and  y,  and  e  is  not  in 
U,  i.e.,  the  augmenting  path  goes  from  nodes  in  X  to  nodes  in  Y  via  edges  not 
in  the  matching, 

<5.  ifyCY  and  x  e  X  are  consecutive  nodes  in  the  sequence,  with  y  before  x,  then 
there  is  an  edge  e  €  E  such  that  e  is  incident  on  both  x  and  y,  and  e  is  in  U, 
i.e.,  the  augmenting  path  goes  from  nodes  in  Y  to  nodes  in  X  via  edges  in  the 
matching; 

The  edges  of  an  augmenting  path  are  the  edges  that  connect  consecutive  nodes  in  the 
augmenting  path. 


(a)  A  matching 

Figure  D.l: 


(b)  A  maximum  matching 
Matchings  in  a  bipartite  graph 


For  example,  Figure  D.la  shows  a  bipartite  graph.  The  bold  edges  with  arrows 
at  each  end  form  a  matching.  One  can  easily  verify  that  the  sequence 

{x4,y2,X2,y3,X3,yuXi,y4)  (D.l) 

is  an  augmenting  path.  Note  that  any  augmenting  path  hcis  one  less  edge  in  the 
matching  than  not  in  the  matching.  Hence,  augmenting  paths  can  be  used  to  increase 
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the  cardinality  of  a  matching  by  replacing  the  edges  of  the  matching  that  are  in  the 
augmenting  path  by  edges  in  the  augn.  nting  path  that  are  not  in  the  matching. 
Hence,  we  have  the  following: 

Lemma  D.l  (Increasing  a  matching)  Let  G  =  {X,Y,E)  be  a  bipartite  graph, 
and  let  U  C  E  be  a  matching.  Let  P  be  an  augmenting  path  in  G,  with  respect 
to  U ,  and  let  Ep  be  the  edges  of  P.  Let  U'  C  E  be  the  union  of  the  set  of  edges  in  U 
but  not  in  Ep  and  the  set  of  edges  in  Ep  but  not  in  U: 

U'  =  iU\Ep)UiEp\U) 

Then  U'  is  a  matching  in  G,  and  \U'\  >  lf7|. 

For  example,  the  augmenting  path  in  Equation  D.l  can  be  used  to  increase  the 
matching  shown  in  Figure  D.la,  by  replacing  each  bold  edge  in  the  path  with  a  light 
edge,  and  each  light  edge  iu  .he  path  with  a  bold  edge.  The  resulting  matching,  of  a 
higher  cardinality,  is  shown  in  Figure  D.lb, 

The  above  lemma  shows  how  we  can  increase  the  cardinality  of  a  matching  U  by 
first  identifying  an  augmenting  path  with  respect  to  U.  The  next  lemma  tells  us  that 
if  no  such  augmenting  path  exists,  then  the  matching  is  of  mtiximum  cardinality,  i.e., 
it  is  a  maximum  matching. 

Lemma  D.2  Let  G  =  (J^,  Y,  E)  be  a  bipartite  graph,  and  let  U  Q  E  be  a  matching 
in  G  such  that  there  is  no  augmenting  path  with  respect  to  U.  Then  U  is  a  maximum 
matching. 

The  above  lemma  is  a  special  caise  of  a  similar  result  of  network  flow  algorithms. 
The  interested  reader  is  referred  to  any  textbook  on  network  flow'  algorithms,  e.g., 
[Cormen  et  ai,  1990;  Even,  1979],  for  the  details. 

Lemmas  D.l  and  D.2  give  us  the  following  algorithm  for  finding  a  maximum 
matching  in  a  bipartite  graph  G  =  (A',  Y,  E): 

The  initial  matching  U  can  be  any  matching,  including  the  empty  set.  The  com¬ 
plexity  of  the  above  algorithm  depends  on  the  algorithm  used  to  find  augmenting 
paths.  For  practical  purposes,  we  have  found  that  a  straightforward  depth  first  search 


283 


function  find-maximv.m-matching{G) 

Let  U  be  any  matching  in  G 
while  there  is  an  augmenting  path  P,  wrt  U 
Use  P  to  increase  the  cardinality  of  U 
return  U 
end 


Figure  D.2:  Algorithm  for  finding  a  maximum  matching 

gives  adequate  results,  though  faster  algorithms  are  possible,  e.g.,  Dinic’s  network  flow 
algorithm  described  in  [Even,  1979].  If  we  use  Dinic’s  algorithm  for  finding  augment¬ 
ing  paths,  the  complexity  of  the  above  algorithm  is  0{\VY/‘^\E\)^  where  V  =  X  UY 
is  the  set  of  nodes  and  E  is  the  set  of  edges  in  G  (see  [Even,  1979]  for  the  details). 
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